Unsupervised Construction of a Word List on Tourism from Wikipedia

From Wikipedia Quality
Revision as of 08:54, 25 May 2019 by Sylwia (talk | contribs) (New work - Unsupervised Construction of a Word List on Tourism from Wikipedia)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Unsupervised Construction of a Word List on Tourism from Wikipedia - scientific work related to Wikipedia quality published in 2015, written by Dittaya Wanvarie, Sansanee Ek-atchariya and Thanakon Kaewwipat.

Overview

The demand for word lists in a specialized domain is increasing in language learning. Authors propose an unsupervised framework to extract a word list from Wikipedia data for a language learning class specialized on tourism. Authors extract topics in Wikipedia articles using non-negative matrix factorization. Each topic is classified as tourism related or not using articles in WikiVoyage. Authors choose paragraphs in Wikipedia that are classified as in-domain and rank words in such paragraphs by their frequencies. The proposed framework retrieves more than 90% of words in the gold list, but the extracted list still includes a large number of general terms.