Difference between revisions of "Mining Naturally-Occurring Corrections and Paraphrases from Wikipedia's Revision History"

From Wikipedia Quality
Jump to: navigation, search
(Adding wikilinks)
(+ Infobox work)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Mining Naturally-Occurring Corrections and Paraphrases from Wikipedia's Revision History
 +
| date = 2010
 +
| authors = [[Aurélien Max]]<br />[[Guillaume Wisniewski]]
 +
| link = http://www.lrec-conf.org/proceedings/lrec2010/pdf/827_Paper.pdf
 +
}}
 
'''Mining Naturally-Occurring Corrections and Paraphrases from Wikipedia's Revision History''' - scientific work related to [[Wikipedia quality]] published in 2010, written by [[Aurélien Max]] and [[Guillaume Wisniewski]].
 
'''Mining Naturally-Occurring Corrections and Paraphrases from Wikipedia's Revision History''' - scientific work related to [[Wikipedia quality]] published in 2010, written by [[Aurélien Max]] and [[Guillaume Wisniewski]].
  
 
== Overview ==
 
== Overview ==
 
Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, authors present WiCoPaCo ([[Wikipedia]] Correction and Paraphrase Corpus), a new freely-available resource built by automatically mining Wikipedia’s revision history. The WiCoPaCo corpus focuses on local modifications made by human revisors and include various types of corrections (such as spelling error or typographical corrections) and rewritings, which can be categorized broadly into meaning-preserving and meaning-altering revisions. Authors present an initial hand-built typology of these revisions, but the resource allows for any possible annotation scheme. Authors discuss the main motivations for building such a resource and describe the main technical details guiding its construction. Authors also present applications and data analysis on French and report initial results on spelling error correction and morphosyntactic rewriting. The WiCoPaCo corpus can be freely downloaded from http://wicopaco.limsi.fr.
 
Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, authors present WiCoPaCo ([[Wikipedia]] Correction and Paraphrase Corpus), a new freely-available resource built by automatically mining Wikipedia’s revision history. The WiCoPaCo corpus focuses on local modifications made by human revisors and include various types of corrections (such as spelling error or typographical corrections) and rewritings, which can be categorized broadly into meaning-preserving and meaning-altering revisions. Authors present an initial hand-built typology of these revisions, but the resource allows for any possible annotation scheme. Authors discuss the main motivations for building such a resource and describe the main technical details guiding its construction. Authors also present applications and data analysis on French and report initial results on spelling error correction and morphosyntactic rewriting. The WiCoPaCo corpus can be freely downloaded from http://wicopaco.limsi.fr.

Revision as of 10:46, 9 November 2020


Mining Naturally-Occurring Corrections and Paraphrases from Wikipedia's Revision History
Authors
Aurélien Max
Guillaume Wisniewski
Publication date
2010
Links
Original

Mining Naturally-Occurring Corrections and Paraphrases from Wikipedia's Revision History - scientific work related to Wikipedia quality published in 2010, written by Aurélien Max and Guillaume Wisniewski.

Overview

Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, authors present WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), a new freely-available resource built by automatically mining Wikipedia’s revision history. The WiCoPaCo corpus focuses on local modifications made by human revisors and include various types of corrections (such as spelling error or typographical corrections) and rewritings, which can be categorized broadly into meaning-preserving and meaning-altering revisions. Authors present an initial hand-built typology of these revisions, but the resource allows for any possible annotation scheme. Authors discuss the main motivations for building such a resource and describe the main technical details guiding its construction. Authors also present applications and data analysis on French and report initial results on spelling error correction and morphosyntactic rewriting. The WiCoPaCo corpus can be freely downloaded from http://wicopaco.limsi.fr.