Inducing Conceptual Embedding Spaces from Wikipedia
Inducing Conceptual Embedding Spaces from Wikipedia - scientific work related to Wikipedia quality published in 2017, written by Gerard de Melo.
Overview
The word2vec word vector representations are one of the most well-known new semantic resources to appear in recent years. While large sets of pre-trained vectors are available, these focus on frequent words and multi-word expressions but lack sufficient coverage of named entities. Moreover, Google only released pre-trained vectors for English. In this paper, authors explore an automatic expansion of Google's pre-trained vectors using Wikipedia, adding millions of concepts and named entities in over 270 languages. Authors method enables all of these to reside in the same vector space, thus flexibly facilitating cross-lingual semantic applications.