Information Extraction from Wikipedia: Moving Down the Long Tail

From Wikipedia Quality
Jump to: navigation, search
Information Extraction from Wikipedia: Moving Down the Long Tail
Authors
Feifei Wu
Raphael Hoffmann
Daniel S. Weld
Publication date
2008
ISBN
978-160558193-4
DOI
10.1145/1401890.1401978
Links

Information Extraction from Wikipedia: Moving Down the Long Tail - scientific work about Wikipedia quality published in 2008, written by Feifei Wu, Raphael Hoffmann and Daniel S. Weld.

Overview

Not only is Wikipedia a comprehensive source of quality information, it has several kinds of internal structure (e.g., relational summaries known as infoboxes), which enable self-supervised information extraction. While previous efforts at extraction from Wikipedia achieve high precision and recall on well-populated classes of articles, they fail in a larger number of cases, largely because incomplete articles and infrequent use of infoboxes lead to insufficient training data. This paper presents three novel techniques for increasing recall from Wikipedia's long tail of sparse classes: (1) shrinkage over an automatically-learned subsumption taxonomy, (2) a retraining technique for improving the training data, and (3) supplementing results by extracting from the broader Web. Their experiments compare design variations and show that, used in concert, these techniques increase recall by a factor of 1.76 to 8.71 while maintaining or increasing precision.

Embed

Wikipedia Quality

Wu, Feifei; Hoffmann, Raphael; Weld, Daniel S.. (2008). "[[Information Extraction from Wikipedia: Moving Down the Long Tail]]". Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008, pp. 731-739. ISBN: 978-160558193-4. DOI: 10.1145/1401890.1401978.

English Wikipedia

{{cite journal |last1=Wu |first1=Feifei |last2=Hoffmann |first2=Raphael |last3=Weld |first3=Daniel S. |title=Information Extraction from Wikipedia: Moving Down the Long Tail |date=2008 |isbn=978-160558193-4 |doi=10.1145/1401890.1401978 |url=https://wikipediaquality.com/wiki/Information_Extraction_from_Wikipedia:_Moving_Down_the_Long_Tail |journal=Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008, pp. 731-739}}

HTML

Wu, Feifei; Hoffmann, Raphael; Weld, Daniel S.. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Information_Extraction_from_Wikipedia:_Moving_Down_the_Long_Tail">Information Extraction from Wikipedia: Moving Down the Long Tail</a>&quot;. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2008, pp. 731-739. ISBN: 978-160558193-4. DOI: 10.1145/1401890.1401978.