Difference between revisions of "A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters"

From Wikipedia Quality
Jump to: navigation, search
(Adding wikilinks)
(infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters
 +
| date = 2010
 +
| authors = [[Grzegorz Chrupała]]<br />[[Dietrich Klakow]]
 +
| link = http://www.lrec-conf.org/proceedings/lrec2010/pdf/538_Paper.pdf
 +
}}
 
'''A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters''' - scientific work related to [[Wikipedia quality]] published in 2010, written by [[Grzegorz Chrupała]] and [[Dietrich Klakow]].
 
'''A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters''' - scientific work related to [[Wikipedia quality]] published in 2010, written by [[Grzegorz Chrupała]] and [[Dietrich Klakow]].
  
 
== Overview ==
 
== Overview ==
 
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. Authors fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, authors use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German [[Wikipedia]] articles. Authors extract informative [[features]] of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort.
 
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. Authors fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, authors use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German [[Wikipedia]] articles. Authors extract informative [[features]] of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort.

Revision as of 07:26, 16 January 2021


A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters
Authors
Grzegorz Chrupała
Dietrich Klakow
Publication date
2010
Links
Original

A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters - scientific work related to Wikipedia quality published in 2010, written by Grzegorz Chrupała and Dietrich Klakow.

Overview

Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. Authors fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, authors use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German Wikipedia articles. Authors extract informative features of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort.