Difference between revisions of "Mining Wikipedia Revision Histories for Improving Sentence Compression"

From Wikipedia Quality
Jump to: navigation, search
(Wikilinks)
(infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Mining Wikipedia Revision Histories for Improving Sentence Compression
 +
| date = 2008
 +
| authors = [[Elif Yamangil]]<br />[[Rani Nelken]]
 +
| doi = 10.3115/1557690.1557726
 +
| link = https://dl.acm.org/citation.cfm?doid=1557690.1557726
 +
}}
 
'''Mining Wikipedia Revision Histories for Improving Sentence Compression''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Elif Yamangil]] and [[Rani Nelken]].
 
'''Mining Wikipedia Revision Histories for Improving Sentence Compression''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Elif Yamangil]] and [[Rani Nelken]].
  
 
== Overview ==
 
== Overview ==
 
A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of [[Wikipedia]] for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.
 
A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of [[Wikipedia]] for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.

Revision as of 11:31, 27 March 2020


Mining Wikipedia Revision Histories for Improving Sentence Compression
Authors
Elif Yamangil
Rani Nelken
Publication date
2008
DOI
10.3115/1557690.1557726
Links
Original

Mining Wikipedia Revision Histories for Improving Sentence Compression - scientific work related to Wikipedia quality published in 2008, written by Elif Yamangil and Rani Nelken.

Overview

A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.