Difference between revisions of "Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes"

From Wikipedia Quality
Jump to: navigation, search
(Wikilinks)
(cats.)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes
 +
| date = 2011
 +
| authors = [[Wen-Pin Lin]]<br />[[Matthew G. Snover]]<br />[[Heng Ji]]
 +
| link = https://dl.acm.org/citation.cfm?id=2140464
 +
| plink = https://pdfs.semanticscholar.org/42ce/14945af9afdf7da7efdbe83b957fc04d4648.pdf
 +
}}
 
'''Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Wen-Pin Lin]], [[Matthew G. Snover]] and [[Heng Ji]].
 
'''Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Wen-Pin Lin]], [[Matthew G. Snover]] and [[Heng Ji]].
  
 
== Overview ==
 
== Overview ==
 
The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from [[multiple languages]] in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using [[Wikipedia]] Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. Authors also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with approach. Comparisons show approach can effectively augment the results from each of these alternative methods and resources.
 
The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from [[multiple languages]] in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using [[Wikipedia]] Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. Authors also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with approach. Comparisons show approach can effectively augment the results from each of these alternative methods and resources.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Lin, Wen-Pin; Snover, Matthew G.; Ji, Heng. (2011). "[[Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes]]". Association for Computational Linguistics.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Lin |first1=Wen-Pin |last2=Snover |first2=Matthew G. |last3=Ji |first3=Heng |title=Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes |date=2011 |url=https://wikipediaquality.com/wiki/Unsupervised_Language-Independent_Name_Translation_Mining_from_Wikipedia_Infoboxes |journal=Association for Computational Linguistics}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Lin, Wen-Pin; Snover, Matthew G.; Ji, Heng. (2011). &amp;quot;<a href="https://wikipediaquality.com/wiki/Unsupervised_Language-Independent_Name_Translation_Mining_from_Wikipedia_Infoboxes">Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes</a>&amp;quot;. Association for Computational Linguistics.
 +
</nowiki>
 +
</code>
 +
 +
 +
 +
[[Category:Scientific works]]

Latest revision as of 23:58, 7 February 2021


Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes
Authors
Wen-Pin Lin
Matthew G. Snover
Heng Ji
Publication date
2011
Links
Original Preprint

Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes - scientific work related to Wikipedia quality published in 2011, written by Wen-Pin Lin, Matthew G. Snover and Heng Ji.

Overview

The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. Authors also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with approach. Comparisons show approach can effectively augment the results from each of these alternative methods and resources.

Embed

Wikipedia Quality

Lin, Wen-Pin; Snover, Matthew G.; Ji, Heng. (2011). "[[Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes]]". Association for Computational Linguistics.

English Wikipedia

{{cite journal |last1=Lin |first1=Wen-Pin |last2=Snover |first2=Matthew G. |last3=Ji |first3=Heng |title=Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes |date=2011 |url=https://wikipediaquality.com/wiki/Unsupervised_Language-Independent_Name_Translation_Mining_from_Wikipedia_Infoboxes |journal=Association for Computational Linguistics}}

HTML

Lin, Wen-Pin; Snover, Matthew G.; Ji, Heng. (2011). &quot;<a href="https://wikipediaquality.com/wiki/Unsupervised_Language-Independent_Name_Translation_Mining_from_Wikipedia_Infoboxes">Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes</a>&quot;. Association for Computational Linguistics.