Difference between revisions of "Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(+ links)
(Infobox work)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia
 +
| date = 2012
 +
| authors = [[Sameer Singh]]<br />[[Amarnag Subramanya]]<br />[[Fernando Pereira]]<br />[[Andrew McCallum]]
 +
| link = https://web.cs.umass.edu/publication/docs/2012/UM-CS-2012-015.pdf
 +
}}
 
'''Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2012, written by [[Sameer Singh]], [[Amarnag Subramanya]], [[Fernando Pereira]] and [[Andrew McCallum]].
 
'''Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2012, written by [[Sameer Singh]], [[Amarnag Subramanya]], [[Fernando Pereira]] and [[Andrew McCallum]].
  
 
== Overview ==
 
== Overview ==
 
Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. This paper presents a method for automatically gathering massive amounts of naturally-occurring cross-document reference data. Authors also present the Wikilinks dataset comprising of 40 million mentions over 3 million entities, gathered using this method. Authors method is based on finding hyperlinks to [[Wikipedia]] from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, authors are able to include many styles of text beyond newswire and many entity types beyond people.
 
Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. This paper presents a method for automatically gathering massive amounts of naturally-occurring cross-document reference data. Authors also present the Wikilinks dataset comprising of 40 million mentions over 3 million entities, gathered using this method. Authors method is based on finding hyperlinks to [[Wikipedia]] from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, authors are able to include many styles of text beyond newswire and many entity types beyond people.

Revision as of 09:28, 22 May 2020


Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia
Authors
Sameer Singh
Amarnag Subramanya
Fernando Pereira
Andrew McCallum
Publication date
2012
Links
Original

Wikilinks: a Large-Scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia - scientific work related to Wikipedia quality published in 2012, written by Sameer Singh, Amarnag Subramanya, Fernando Pereira and Andrew McCallum.

Overview

Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. This paper presents a method for automatically gathering massive amounts of naturally-occurring cross-document reference data. Authors also present the Wikilinks dataset comprising of 40 million mentions over 3 million entities, gathered using this method. Authors method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, authors are able to include many styles of text beyond newswire and many entity types beyond people.