Exploring the Use of Word Embeddings and Random Walks on Wikipedia for the Cogalex Shared Task

From Wikipedia Quality
Revision as of 10:09, 9 December 2020 by Leslie (talk | contribs) (Links)
Jump to: navigation, search

Exploring the Use of Word Embeddings and Random Walks on Wikipedia for the Cogalex Shared Task - scientific work related to Wikipedia quality published in 2014, written by Josu Goikoetxea, Eneko Agirre and Aitor Soroa.

Overview

In participation on the task authors wanted to test three different kinds of relatedness algorithms: one based on embeddings induced from corpora, another based on random walks on WordNet and a last one based on random walks based on Wikipedia. All three of them perform similarly in noun relatedness datasets like WordSim353, close to the highest reported values. Although the task definition gave examples of nouns, the train and test data were based on the Edinburgh Association Thesaurus, and around 50% of the target words were not nouns. The corpus-based algorithm performed much better than the other methods in the training dataset, and was thus submitted for the test.