Difference between revisions of "Using Wikipedia for Cross-Language Named Entity Recognition"

Latest revision as of 15:08, 17 November 2020

Using Wikipedia for Cross-Language Named Entity Recognition
Authors	Eraldo R. Fernandes Ulf Brefeld R Blanco Gonzalez J Asterias
Publication date	2016
Links	Original

Using Wikipedia for Cross-Language Named Entity Recognition - scientific work related to Wikipedia quality published in 2016, written by Eraldo R. Fernandes, Ulf Brefeld, R Blanco Gonzalez and J Asterias.

Overview

Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, authors present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of Wikipedia. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, authors devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, authors observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. Authors demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.

Embed

Wikipedia Quality

Fernandes, Eraldo R.; Brefeld, Ulf; Gonzalez, R Blanco; Asterias, J. (2016). "[[Using Wikipedia for Cross-Language Named Entity Recognition]]". Springer.

English Wikipedia

{{cite journal |last1=Fernandes |first1=Eraldo R. |last2=Brefeld |first2=Ulf |last3=Gonzalez |first3=R Blanco |last4=Asterias |first4=J |title=Using Wikipedia for Cross-Language Named Entity Recognition |date=2016 |url=https://wikipediaquality.com/wiki/Using_Wikipedia_for_Cross-Language_Named_Entity_Recognition |journal=Springer}}

HTML

Fernandes, Eraldo R.; Brefeld, Ulf; Gonzalez, R Blanco; Asterias, J. (2016). "<a href="https://wikipediaquality.com/wiki/Using_Wikipedia_for_Cross-Language_Named_Entity_Recognition">Using Wikipedia for Cross-Language Named Entity Recognition</a>". Springer.

@@ Line 1: / Line 1: @@
 {{Infobox work
 | title = Using Wikipedia for Cross-Language Named Entity Recognition
-| date = 2015
+| date = 2016
-| authors = [[Eraldo R. Fernandes]]<br />[[Ulf Brefeld]]<br />[[Roi Blanco]]<br />[[Jordi Atserias]]
+| authors = [[Eraldo R. Fernandes]]<br />[[Ulf Brefeld]]<br />[[R Blanco Gonzalez]]<br />[[J Asterias]]
-| doi = 10.1007/978-3-319-29009-6_1
+| link = https://researchbank.rmit.edu.au/view/rmit:44541
-| link = https://link.springer.com/chapter/10.1007/978-3-319-29009-6_1/fulltext.html
 }}
-'''Using Wikipedia for Cross-Language Named Entity Recognition''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Eraldo R. Fernandes]], [[Ulf Brefeld]], [[Roi Blanco]] and [[Jordi Atserias]].
+'''Using Wikipedia for Cross-Language Named Entity Recognition''' - scientific work related to [[Wikipedia quality]] published in 2016, written by [[Eraldo R. Fernandes]], [[Ulf Brefeld]], [[R Blanco Gonzalez]] and [[J Asterias]].
 == Overview ==
-Named [[entity recognition]] and classification NERC is fundamental for [[natural language processing]] tasks such as [[information extraction]], [[question answering]], and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on manually annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, authors present a novel method to automatically generate partially annotated corpora for NERC by exploiting the link structure of [[Wikipedia]]. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, authors devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, authors observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. Authors demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.
+Named [[entity recognition]] and classification (NERC) is fundamental for [[natural language processing]] tasks such as [[information extraction]], [[question answering]], and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist for non-standard languages and labeling additional data manually is tedious and costly. In this article, authors present a novel method to automatically generate (partially) annotated corpora for NERC by exploiting the link structure of [[Wikipedia]]. Firstly, Wikipedia entries in the source language are labeled with the NERC tag set. Secondly, Wikipedia language links are exploited to propagate the annotations in the target language. Finally, mentions of the labeled entities in the target language are annotated with the respective tags. The procedure results in a partially annotated corpus that is likely to contain unannotated entities. To learn from such partially annotated data, authors devise two simple extensions of hidden Markov models and structural perceptrons. Empirically, authors observe that using the automatically generated data leads to more accurate prediction models than off-the-shelf NERC methods. Authors demonstrate that the novel extensions of HMMs and perceptrons effectively exploit the partially annotated data and outperforms their baseline counterparts in all settings.
 == Embed ==
@@ Line 15: / Line 14: @@
 <code>
 <nowiki>
-Fernandes, Eraldo R.; Brefeld, Ulf; Blanco, Roi; Atserias, Jordi. (2015). "[[Using Wikipedia for Cross-Language Named Entity Recognition]]". Springer, Cham. DOI: 10.1007/978-3-319-29009-6_1.
+Fernandes, Eraldo R.; Brefeld, Ulf; Gonzalez, R Blanco; Asterias, J. (2016). "[[Using Wikipedia for Cross-Language Named Entity Recognition]]". Springer.
 </nowiki>
 </code>
@@ Line 22: / Line 21: @@
 <code>
 <nowiki>
-{{cite journal |last1=Fernandes |first1=Eraldo R. |last2=Brefeld |first2=Ulf |last3=Blanco |first3=Roi |last4=Atserias |first4=Jordi |title=Using Wikipedia for Cross-Language Named Entity Recognition |date=2015 |doi=10.1007/978-3-319-29009-6_1 |url=https://wikipediaquality.com/wiki/Using_Wikipedia_for_Cross-Language_Named_Entity_Recognition |journal=Springer, Cham}}
+{{cite journal |last1=Fernandes |first1=Eraldo R. |last2=Brefeld |first2=Ulf |last3=Gonzalez |first3=R Blanco |last4=Asterias |first4=J |title=Using Wikipedia for Cross-Language Named Entity Recognition |date=2016 |url=https://wikipediaquality.com/wiki/Using_Wikipedia_for_Cross-Language_Named_Entity_Recognition |journal=Springer}}
 </nowiki>
 </code>
@@ Line 29: / Line 28: @@
 <code>
 <nowiki>
-Fernandes, Eraldo R.; Brefeld, Ulf; Blanco, Roi; Atserias, Jordi. (2015). &amp;quot;<a href="https://wikipediaquality.com/wiki/Using_Wikipedia_for_Cross-Language_Named_Entity_Recognition">Using Wikipedia for Cross-Language Named Entity Recognition</a>&amp;quot;. Springer, Cham. DOI: 10.1007/978-3-319-29009-6_1.
+Fernandes, Eraldo R.; Brefeld, Ulf; Gonzalez, R Blanco; Asterias, J. (2016). &amp;quot;<a href="https://wikipediaquality.com/wiki/Using_Wikipedia_for_Cross-Language_Named_Entity_Recognition">Using Wikipedia for Cross-Language Named Entity Recognition</a>&amp;quot;. Springer.
 </nowiki>
 </code>

Difference between revisions of "Using Wikipedia for Cross-Language Named Entity Recognition"

Latest revision as of 15:08, 17 November 2020

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools