Difference between revisions of "Transforming Wikipedia into Named Entity Training Data"
(Links) |
(+ infobox) |
||
Line 1: | Line 1: | ||
+ | {{Infobox work | ||
+ | | title = Transforming Wikipedia into Named Entity Training Data | ||
+ | | date = 2008 | ||
+ | | authors = [[Joel Nothman]]<br />[[James R. Curran]]<br />[[Tara Murphy]] | ||
+ | | link = http://www.lrec-conf.org/proceedings/lrec2014/pdf/403_Paper.pdf | ||
+ | | plink = https://pdfs.semanticscholar.org/04ca/48e573c0800fc572f2af1d475dd2645e840a.pdf | ||
+ | }} | ||
'''Transforming Wikipedia into Named Entity Training Data''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Joel Nothman]], [[James R. Curran]] and [[Tara Murphy]]. | '''Transforming Wikipedia into Named Entity Training Data''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Joel Nothman]], [[James R. Curran]] and [[Tara Murphy]]. | ||
== Overview == | == Overview == | ||
Statistical [[named entity]] recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. Authors exploit [[Wikipedia]] to create a massive corpus of named entity annotated text. Authors transform Wikipedia’s links into named entity annotations by classifying the target articles into common entity types (e.g. person, organisation and location). Comparing to MUC, CONLL and BBN corpora, Wikipedia generally performs better than other cross-corpus train/test pairs. | Statistical [[named entity]] recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. Authors exploit [[Wikipedia]] to create a massive corpus of named entity annotated text. Authors transform Wikipedia’s links into named entity annotations by classifying the target articles into common entity types (e.g. person, organisation and location). Comparing to MUC, CONLL and BBN corpora, Wikipedia generally performs better than other cross-corpus train/test pairs. |
Revision as of 08:21, 17 July 2019
Authors | Joel Nothman James R. Curran Tara Murphy |
---|---|
Publication date | 2008 |
Links | Original Preprint |
Transforming Wikipedia into Named Entity Training Data - scientific work related to Wikipedia quality published in 2008, written by Joel Nothman, James R. Curran and Tara Murphy.
Overview
Statistical named entity recognisers require costly hand-labelled training data and, as a result, most existing corpora are small. Authors exploit Wikipedia to create a massive corpus of named entity annotated text. Authors transform Wikipedia’s links into named entity annotations by classifying the target articles into common entity types (e.g. person, organisation and location). Comparing to MUC, CONLL and BBN corpora, Wikipedia generally performs better than other cross-corpus train/test pairs.