Difference between revisions of "Classifying Articles in English and German Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Classifying Articles in English and German Wikipedia - basic info)
 
(Links)
Line 1: Line 1:
'''Classifying Articles in English and German Wikipedia''' - scientific work related to Wikipedia quality published in 2009, written by Nicky Ringland, Joel Nothman, Tara Murphy and James R. Curran.
+
'''Classifying Articles in English and German Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Nicky Ringland]], [[Joel Nothman]], [[Tara Murphy]] and [[James R. Curran]].
  
 
== Overview ==
 
== Overview ==
Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for multiple languages, is prohibitive, meaning automated methods for developing resources are crucial. Authors investigate the automatic generation of NE annotated data in German from Wikipedia. By incorporating structural features of Wikipedia, authors can develop a German corpus which accurately classifies Wikipedia articles into NE categories to within 1% F -score of the state-of-the-art process in English.
+
Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for [[multiple languages]], is prohibitive, meaning automated methods for developing resources are crucial. Authors investigate the automatic generation of NE annotated data in German from [[Wikipedia]]. By incorporating structural [[features]] of Wikipedia, authors can develop a German corpus which accurately classifies Wikipedia articles into NE [[categories]] to within 1% F -score of the state-of-the-art process in English.

Revision as of 21:39, 15 July 2019

Classifying Articles in English and German Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Nicky Ringland, Joel Nothman, Tara Murphy and James R. Curran.

Overview

Named Entity (NE) information is critical for Information Extraction (IE) tasks. However, the cost of manually annotating sufficient data for training purposes, especially for multiple languages, is prohibitive, meaning automated methods for developing resources are crucial. Authors investigate the automatic generation of NE annotated data in German from Wikipedia. By incorporating structural features of Wikipedia, authors can develop a German corpus which accurately classifies Wikipedia articles into NE categories to within 1% F -score of the state-of-the-art process in English.