Automatic Acquisition of Controlled Vocabularies from Wikipedia Using Wikilinks, Word Ranking, and a Dependency Parser

From Wikipedia Quality
Revision as of 08:34, 14 January 2021 by Barbara (talk | contribs) (+ Infobox work)
Jump to: navigation, search


Automatic Acquisition of Controlled Vocabularies from Wikipedia Using Wikilinks, Word Ranking, and a Dependency Parser
Authors
Ruben Dorado
Audrey Bramy
Camilo Mejía-Moncayo
Alix E. Rojas
Publication date
2017
DOI
10.1007/978-3-319-66562-7_3
Links
Original

Automatic Acquisition of Controlled Vocabularies from Wikipedia Using Wikilinks, Word Ranking, and a Dependency Parser - scientific work related to Wikipedia quality published in 2017, written by Ruben Dorado, Audrey Bramy, Camilo Mejía-Moncayo and Alix E. Rojas.

Overview

Controlled vocabularies are important resources used in several tasks such as machine translation, text summarization, and text analysis. However, the development of such resources is expensive and time-consuming. On the other hand, the Wikipedia, a free collaborative encyclopedia, contains plenty of semi-structured information that can be used by an automatic process to create new resources. This paper proposes a method to extract semantic information from the Wikipedia in the form of a controlled vocabulary. The method combines keywords obtained for a specific Wikipedia article with three different strategies: using Wikipedia annotations called wikilinks, a ranking measure to obtain keywords from text, and a dependency parser. To evaluate the model, authors performed an analysis in terms of coverage and performance of the acquired vocabulary using WordNet as a gold standard.