Difference between revisions of "A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters"
(infobox) |
(+ Embed) |
||
Line 9: | Line 9: | ||
== Overview == | == Overview == | ||
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. Authors fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, authors use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German [[Wikipedia]] articles. Authors extract informative [[features]] of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort. | Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. Authors fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, authors use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German [[Wikipedia]] articles. Authors extract informative [[features]] of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort. | ||
+ | |||
+ | == Embed == | ||
+ | === Wikipedia Quality === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Chrupała, Grzegorz; Klakow, Dietrich. (2010). "[[A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters]]". | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === English Wikipedia === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | {{cite journal |last1=Chrupała |first1=Grzegorz |last2=Klakow |first2=Dietrich |title=A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters |date=2010 |url=https://wikipediaquality.com/wiki/A_Named_Entity_Labeler_for_German:_Exploiting_Wikipedia_and_Distributional_Clusters}} | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === HTML === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Chrupała, Grzegorz; Klakow, Dietrich. (2010). &quot;<a href="https://wikipediaquality.com/wiki/A_Named_Entity_Labeler_for_German:_Exploiting_Wikipedia_and_Distributional_Clusters">A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters</a>&quot;. | ||
+ | </nowiki> | ||
+ | </code> |
Revision as of 23:45, 29 January 2021
Authors | Grzegorz Chrupała Dietrich Klakow |
---|---|
Publication date | 2010 |
Links | Original |
A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters - scientific work related to Wikipedia quality published in 2010, written by Grzegorz Chrupała and Dietrich Klakow.
Overview
Named Entity Recognition is a relatively well-understood NLP task, with many publicly available training resources and software for English. Other languages tend to be underserved in this area. For German, CoNLL-2003 provides training data, but there are no publicly available, ready-to-use tools. Authors fill this gap and develop a German NER system with state-of-the-art performance. In addition to CoNLL 2003 labeled training data, authors use two additional resources: (i) 32 million words of unlabeled text and (ii) infobox labels in German Wikipedia articles. Authors extract informative features of word-types from those resources and train a supervised model on the labeled training data. This approach allows us to deal better with word-types unseen in the training data and achieve state-of-the-art performance on German with little engineering effort.
Embed
Wikipedia Quality
Chrupała, Grzegorz; Klakow, Dietrich. (2010). "[[A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters]]".
English Wikipedia
{{cite journal |last1=Chrupała |first1=Grzegorz |last2=Klakow |first2=Dietrich |title=A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters |date=2010 |url=https://wikipediaquality.com/wiki/A_Named_Entity_Labeler_for_German:_Exploiting_Wikipedia_and_Distributional_Clusters}}
HTML
Chrupała, Grzegorz; Klakow, Dietrich. (2010). "<a href="https://wikipediaquality.com/wiki/A_Named_Entity_Labeler_for_German:_Exploiting_Wikipedia_and_Distributional_Clusters">A Named Entity Labeler for German: Exploiting Wikipedia and Distributional Clusters</a>".