Difference between revisions of "Mining Wikipedia Revision Histories for Improving Sentence Compression"
(New study: Mining Wikipedia Revision Histories for Improving Sentence Compression) |
(Wikilinks) |
||
Line 1: | Line 1: | ||
− | '''Mining Wikipedia Revision Histories for Improving Sentence Compression''' - scientific work related to Wikipedia quality published in 2008, written by Elif Yamangil and Rani Nelken. | + | '''Mining Wikipedia Revision Histories for Improving Sentence Compression''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Elif Yamangil]] and [[Rani Nelken]]. |
== Overview == | == Overview == | ||
− | A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance. | + | A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of [[Wikipedia]] for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance. |
Revision as of 20:02, 10 July 2019
Mining Wikipedia Revision Histories for Improving Sentence Compression - scientific work related to Wikipedia quality published in 2008, written by Elif Yamangil and Rani Nelken.
Overview
A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.