Mining Wikipedia Revision Histories for Improving Sentence Compression

From Wikipedia Quality
Jump to: navigation, search


Mining Wikipedia Revision Histories for Improving Sentence Compression
Authors
Elif Yamangil
Rani Nelken
Publication date
2008
DOI
10.3115/1557690.1557726
Links
Original

Mining Wikipedia Revision Histories for Improving Sentence Compression - scientific work related to Wikipedia quality published in 2008, written by Elif Yamangil and Rani Nelken.

Overview

A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. Authors propose a new and bountiful resource for such training data, which authors obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, authors have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus. Using this new-found data, authors propose a novel lexicalized noisy channel model for sentence compression, achieving improved results in grammaticality and compression rate criteria with a slight decrease in importance.

Embed

Wikipedia Quality

Yamangil, Elif; Nelken, Rani. (2008). "[[Mining Wikipedia Revision Histories for Improving Sentence Compression]]". Association for Computational Linguistics. DOI: 10.3115/1557690.1557726.

English Wikipedia

{{cite journal |last1=Yamangil |first1=Elif |last2=Nelken |first2=Rani |title=Mining Wikipedia Revision Histories for Improving Sentence Compression |date=2008 |doi=10.3115/1557690.1557726 |url=https://wikipediaquality.com/wiki/Mining_Wikipedia_Revision_Histories_for_Improving_Sentence_Compression |journal=Association for Computational Linguistics}}

HTML

Yamangil, Elif; Nelken, Rani. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Mining_Wikipedia_Revision_Histories_for_Improving_Sentence_Compression">Mining Wikipedia Revision Histories for Improving Sentence Compression</a>&quot;. Association for Computational Linguistics. DOI: 10.3115/1557690.1557726.