Classifying Articles in English and German Wikipedia

From Wikipedia Quality
Revision as of 21:39, 15 July 2019 by Mila (talk | contribs) (Links)
Jump to: navigation, search

Classifying Articles in English and German Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Nicky Ringland, Joel Nothman, Tara Murphy and James R. Curran.


Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for multiple languages, is prohibitive, meaning automated methods for developing resources are crucial. Authors investigate the automatic generation of NE annotated data in German from Wikipedia. By incorporating structural features of Wikipedia, authors can develop a German corpus which accurately classifies Wikipedia articles into NE categories to within 1% F -score of the state-of-the-art process in English.