Difference between revisions of "Latent Semantic Analysis Models on Wikipedia and Tasa"

From Wikipedia Quality
Jump to: navigation, search
(New work - Latent Semantic Analysis Models on Wikipedia and Tasa)
 
(Links)
Line 1: Line 1:
'''Latent Semantic Analysis Models on Wikipedia and Tasa''' - scientific work related to Wikipedia quality published in 2014, written by Dan Stefanescu, Rajendra Banjade and Vasile Rus.
+
'''Latent Semantic Analysis Models on Wikipedia and Tasa''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[Dan Stefanescu]], [[Rajendra Banjade]] and [[Vasile Rus]].
  
 
== Overview ==
 
== Overview ==
This paper introduces a collection of freely available Latent Semantic Analysis models built on the entire English Wikipedia and the TASA corpus. The models differ not only on their source, Wikipedia versus TASA, but also on the linguistic items they focus on: all words, content-words, nouns-verbs, and main concepts. Generating such models from large datasets (e.g. Wikipedia), that can provide a large coverage for the actual vocabulary in use, is computationally challenging, which is the reason why large LSA models are rarely available. Authors experiments show that for the task of word-to-word similarity, the scores assigned by these models are strongly correlated with human judgment, outperforming many other frequently used measures, and comparable to the state of the art.
+
This paper introduces a collection of freely available Latent Semantic Analysis models built on the entire [[English Wikipedia]] and the TASA corpus. The models differ not only on their source, [[Wikipedia]] versus TASA, but also on the linguistic items they focus on: all words, content-words, nouns-verbs, and main concepts. Generating such models from large datasets (e.g. Wikipedia), that can provide a large coverage for the actual vocabulary in use, is computationally challenging, which is the reason why large LSA models are rarely available. Authors experiments show that for the task of word-to-word similarity, the scores assigned by these models are strongly correlated with human judgment, outperforming many other frequently used [[measures]], and comparable to the state of the art.

Revision as of 12:02, 14 June 2020

Latent Semantic Analysis Models on Wikipedia and Tasa - scientific work related to Wikipedia quality published in 2014, written by Dan Stefanescu, Rajendra Banjade and Vasile Rus.

Overview

This paper introduces a collection of freely available Latent Semantic Analysis models built on the entire English Wikipedia and the TASA corpus. The models differ not only on their source, Wikipedia versus TASA, but also on the linguistic items they focus on: all words, content-words, nouns-verbs, and main concepts. Generating such models from large datasets (e.g. Wikipedia), that can provide a large coverage for the actual vocabulary in use, is computationally challenging, which is the reason why large LSA models are rarely available. Authors experiments show that for the task of word-to-word similarity, the scores assigned by these models are strongly correlated with human judgment, outperforming many other frequently used measures, and comparable to the state of the art.