Difference between revisions of "Hacking Wikipedia for Hyponymy Relation Acquisition"

Revision as of 14:38, 23 November 2019

Hacking Wikipedia for Hyponymy Relation Acquisition
Authors	Asuka Sumida Kentaro Torisawa
Publication date	2008
Links	Original

Hacking Wikipedia for Hyponymy Relation Acquisition - scientific work related to Wikipedia quality published in 2008, written by Asuka Sumida and Kentaro Torisawa.

Overview

This paper describes a method for extracting a large set of hyponymy relations from Wikipedia. The Wikipedia is much more consistently structured than generic HTML documents, and authors can extract a large number of hyponymy relations with simple methods. In this work, authors managed to extract more than 1.4 × 106 hyponymy relations with 75.3% precision from the Japanese version of the Wikipedia. To the best of knowledge, this is the largest machine-readable thesaurus for Japanese. The main contribution of this paper is a method for hyponymy acquisition from hierarchical layouts in Wikipedia. By using a machine learning technique and pattern matching, authors were able to extract more than 6.3 × 105 relations from hierarchical layouts in the Japanese Wikipedia, and their precision was 76.4%. The remaining hyponymy relations were acquired by existing methods for extracting relations from definition sentences and category pages. This means that extraction from the hierarchical layouts almost doubled the number of relations extracted.

@@ Line 1: / Line 1: @@
+{{Infobox work
+| title = Hacking Wikipedia for Hyponymy Relation Acquisition
+| date = 2008
+| authors = [[Asuka Sumida]]<br />[[Kentaro Torisawa]]
+| link = http://www.aclweb.org/anthology/I/I08/I08-2126.pdf
+}}
 '''Hacking Wikipedia for Hyponymy Relation Acquisition''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Asuka Sumida]] and [[Kentaro Torisawa]].
 == Overview ==
 This paper describes a method for extracting a large set of hyponymy relations from [[Wikipedia]]. The Wikipedia is much more consistently structured than generic HTML documents, and authors can extract a large number of hyponymy relations with simple methods. In this work, authors managed to extract more than 1.4 × 106 hyponymy relations with 75.3% precision from the Japanese version of the Wikipedia. To the best of knowledge, this is the largest machine-readable thesaurus for Japanese. The main contribution of this paper is a method for hyponymy acquisition from hierarchical layouts in Wikipedia. By using a machine learning technique and pattern matching, authors were able to extract more than 6.3 × 105 relations from hierarchical layouts in the Japanese Wikipedia, and their precision was 76.4%. The remaining hyponymy relations were acquired by existing methods for extracting relations from definition sentences and category pages. This means that extraction from the hierarchical layouts almost doubled the number of relations extracted.

Difference between revisions of "Hacking Wikipedia for Hyponymy Relation Acquisition"

Revision as of 14:38, 23 November 2019

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools