Classifying Articles in English and German Wikipedia

From Wikipedia Quality
Revision as of 09:33, 4 June 2019 by Aurora (talk | contribs) (Classifying Articles in English and German Wikipedia - basic info)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Classifying Articles in English and German Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Nicky Ringland, Joel Nothman, Tara Murphy and James R. Curran.

Overview

Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for multiple languages, is prohibitive, meaning automated methods for developing resources are crucial. Authors investigate the automatic generation of NE annotated data in German from Wikipedia. By incorporating structural features of Wikipedia, authors can develop a German corpus which accurately classifies Wikipedia articles into NE categories to within 1% F -score of the state-of-the-art process in English.