Improving Distributed Representation by Feature Selection of Wikipedia

From Wikipedia Quality
Revision as of 22:28, 2 July 2019 by Athena (talk | contribs) (Starting an article - Improving Distributed Representation by Feature Selection of Wikipedia)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Improving Distributed Representation by Feature Selection of Wikipedia - scientific work related to Wikipedia quality published in 2017, written by Dao Van Tuan and Hiroshi Sato.

Overview

Distributed representation plays an important role in many application of Natural Language Processing (NLP). Today, Word2Vec model has been getting an attention against the backdrop of the easy access to enormous language data from the Internet such as Wikipedia. For the effective use of Word2Vec, authors have to concern not only about the improvement of the method itself but also about the process of making training data. In this paper, authors demonstrate that adequate selection of training data can make a great improvement of the performance of Word2Vec compared to existing research. Authors also confirmed that Wikipedia dump data is not a good source of training data as is.