Learning Multilingual Named Entity Recognition from Wikipedia

From Wikipedia Quality
Jump to: navigation, search


Learning Multilingual Named Entity Recognition from Wikipedia
Authors
Joel Nothman
Nicky Ringland
Will Radford
Tara Murphy
James R. Curran
Publication date
2013
DOI
10.1016/j.artint.2012.03.006
Links
Original

Learning Multilingual Named Entity Recognition from Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy and James R. Curran.

Overview

Authors automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck work overcomes. Authors first classify each Wikipedia article into named entity (ne) types, training and evaluating on 7200 manually-labelled Wikipedia articles across nine languages. Authors cross-lingual approach achieves up to 95% accuracy. Authors transform the links between articles into ne annotations by projecting the target [email protected]?s classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, authors better align automatic annotations to gold standards. Authors annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against conll shared task data and other gold-standard corpora. Authors approach outperforms other approaches to automatic ne annotation (Richman and Schone, 2008 [61], Mika et al., 2008 [46]) competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.

Embed

Wikipedia Quality

Nothman, Joel; Ringland, Nicky; Radford, Will; Murphy, Tara; Curran, James R.. (2013). "[[Learning Multilingual Named Entity Recognition from Wikipedia]]". Elsevier Science Publishers Ltd.. DOI: 10.1016/j.artint.2012.03.006.

English Wikipedia

{{cite journal |last1=Nothman |first1=Joel |last2=Ringland |first2=Nicky |last3=Radford |first3=Will |last4=Murphy |first4=Tara |last5=Curran |first5=James R. |title=Learning Multilingual Named Entity Recognition from Wikipedia |date=2013 |doi=10.1016/j.artint.2012.03.006 |url=https://wikipediaquality.com/wiki/Learning_Multilingual_Named_Entity_Recognition_from_Wikipedia |journal=Elsevier Science Publishers Ltd.}}

HTML

Nothman, Joel; Ringland, Nicky; Radford, Will; Murphy, Tara; Curran, James R.. (2013). &quot;<a href="https://wikipediaquality.com/wiki/Learning_Multilingual_Named_Entity_Recognition_from_Wikipedia">Learning Multilingual Named Entity Recognition from Wikipedia</a>&quot;. Elsevier Science Publishers Ltd.. DOI: 10.1016/j.artint.2012.03.006.