Learning Named Entity Recognition from Wikipedia

From Wikipedia Quality
Jump to: navigation, search


Learning Named Entity Recognition from Wikipedia
Authors
Joel Nothman
Publication date
2008
Links
Original

Learning Named Entity Recognition from Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Joel Nothman.

Overview

Authors present a method to produce free, enormous corpora to train taggers for Named Entity Recognition (NER), the task of identifying and classifying names in text, often solved by statistical learning systems. Authors approach utilises the text of Wikipedia, a free online encyclopedia, transforming links between Wikipedia articles into entity annotations. Having derived a baseline corpus, authors found that altering Wikipedia’s links and identifying classes of capitalised non-entity terms would enable the corpus to conform more closely to gold-standard annotations, increasing performance by up to 32% F score. The evaluation of method is novel since the training corpus is not usually a variable in NER experimentation. Authors therefore develop a number of methods for analysing and comparing training corpora. Gold-standard training corpora for NER perform poorly (F score up to 32% lower) when evaluated on test data from a different gold-standard corpus. Authors Wikipedia-derived data can outperform manually-annotated corpora on this cross-corpus evaluation task by up to 7% on held-out test data. These experimental results show that Wikipedia is viable as a source of automatically-annotated training corpora, which have wide domain coverage applicable to a broad range of NLP applications.

Embed

Wikipedia Quality

Nothman, Joel. (2008). "[[Learning Named Entity Recognition from Wikipedia]]".

English Wikipedia

{{cite journal |last1=Nothman |first1=Joel |title=Learning Named Entity Recognition from Wikipedia |date=2008 |url=https://wikipediaquality.com/wiki/Learning_Named_Entity_Recognition_from_Wikipedia}}

HTML

Nothman, Joel. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Learning_Named_Entity_Recognition_from_Wikipedia">Learning Named Entity Recognition from Wikipedia</a>&quot;.