Difference between revisions of "Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Infobox work)
(Categories)
 
(One intermediate revision by one other user not shown)
Line 9: Line 9:
 
== Overview ==
 
== Overview ==
 
Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. This paper presents a method for automatically gathering massive amounts of naturally-occurring cross-document reference data. Authors also present the Wikilinks dataset comprising of 40 million mentions over 3 million entities, gathered using this method. Authors method is based on finding hyperlinks to [[Wikipedia]] from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, authors are able to include many styles of text beyond newswire and many entity types beyond people.
 
Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. This paper presents a method for automatically gathering massive amounts of naturally-occurring cross-document reference data. Authors also present the Wikilinks dataset comprising of 40 million mentions over 3 million entities, gathered using this method. Authors method is based on finding hyperlinks to [[Wikipedia]] from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, authors are able to include many styles of text beyond newswire and many entity types beyond people.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Singh, Sameer; Subramanya, Amarnag; Pereira, Fernando; McCallum, Andrew. (2012). "[[Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia]]".
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Singh |first1=Sameer |last2=Subramanya |first2=Amarnag |last3=Pereira |first3=Fernando |last4=McCallum |first4=Andrew |title=Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia |date=2012 |url=https://wikipediaquality.com/wiki/Wikilinks:_a_Large-Scale_Cross-Document_Coreference_Corpus_Labeled_via_Links_to_Wikipedia}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Singh, Sameer; Subramanya, Amarnag; Pereira, Fernando; McCallum, Andrew. (2012). &amp;quot;<a href="https://wikipediaquality.com/wiki/Wikilinks:_a_Large-Scale_Cross-Document_Coreference_Corpus_Labeled_via_Links_to_Wikipedia">Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia</a>&amp;quot;.
 +
</nowiki>
 +
</code>
 +
 +
 +
 +
[[Category:Scientific works]]

Latest revision as of 00:30, 26 January 2021


Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia
Authors
Sameer Singh
Amarnag Subramanya
Fernando Pereira
Andrew McCallum
Publication date
2012
Links
Original

Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia - scientific work related to Wikipedia quality published in 2012, written by Sameer Singh, Amarnag Subramanya, Fernando Pereira and Andrew McCallum.

Overview

Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. This paper presents a method for automatically gathering massive amounts of naturally-occurring cross-document reference data. Authors also present the Wikilinks dataset comprising of 40 million mentions over 3 million entities, gathered using this method. Authors method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, authors are able to include many styles of text beyond newswire and many entity types beyond people.

Embed

Wikipedia Quality

Singh, Sameer; Subramanya, Amarnag; Pereira, Fernando; McCallum, Andrew. (2012). "[[Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia]]".

English Wikipedia

{{cite journal |last1=Singh |first1=Sameer |last2=Subramanya |first2=Amarnag |last3=Pereira |first3=Fernando |last4=McCallum |first4=Andrew |title=Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia |date=2012 |url=https://wikipediaquality.com/wiki/Wikilinks:_a_Large-Scale_Cross-Document_Coreference_Corpus_Labeled_via_Links_to_Wikipedia}}

HTML

Singh, Sameer; Subramanya, Amarnag; Pereira, Fernando; McCallum, Andrew. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Wikilinks:_a_Large-Scale_Cross-Document_Coreference_Corpus_Labeled_via_Links_to_Wikipedia">Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia</a>&quot;.