Difference between revisions of "Hacking Wikipedia for Hyponymy Relation Acquisition"

From Wikipedia Quality
Jump to: navigation, search
(Adding wikilinks)
(Infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Hacking Wikipedia for Hyponymy Relation Acquisition
 +
| date = 2008
 +
| authors = [[Asuka Sumida]]<br />[[Kentaro Torisawa]]
 +
| link = http://www.aclweb.org/anthology/I/I08/I08-2126.pdf
 +
}}
 
'''Hacking Wikipedia for Hyponymy Relation Acquisition''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Asuka Sumida]] and [[Kentaro Torisawa]].
 
'''Hacking Wikipedia for Hyponymy Relation Acquisition''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Asuka Sumida]] and [[Kentaro Torisawa]].
  
 
== Overview ==
 
== Overview ==
 
This paper describes a method for extracting a large set of hyponymy relations from [[Wikipedia]]. The Wikipedia is much more consistently structured than generic HTML documents, and authors can extract a large number of hyponymy relations with simple methods. In this work, authors managed to extract more than 1.4 × 106 hyponymy relations with 75.3% precision from the Japanese version of the Wikipedia. To the best of knowledge, this is the largest machine-readable thesaurus for Japanese. The main contribution of this paper is a method for hyponymy acquisition from hierarchical layouts in Wikipedia. By using a machine learning technique and pattern matching, authors were able to extract more than 6.3 × 105 relations from hierarchical layouts in the Japanese Wikipedia, and their precision was 76.4%. The remaining hyponymy relations were acquired by existing methods for extracting relations from definition sentences and category pages. This means that extraction from the hierarchical layouts almost doubled the number of relations extracted.
 
This paper describes a method for extracting a large set of hyponymy relations from [[Wikipedia]]. The Wikipedia is much more consistently structured than generic HTML documents, and authors can extract a large number of hyponymy relations with simple methods. In this work, authors managed to extract more than 1.4 × 106 hyponymy relations with 75.3% precision from the Japanese version of the Wikipedia. To the best of knowledge, this is the largest machine-readable thesaurus for Japanese. The main contribution of this paper is a method for hyponymy acquisition from hierarchical layouts in Wikipedia. By using a machine learning technique and pattern matching, authors were able to extract more than 6.3 × 105 relations from hierarchical layouts in the Japanese Wikipedia, and their precision was 76.4%. The remaining hyponymy relations were acquired by existing methods for extracting relations from definition sentences and category pages. This means that extraction from the hierarchical layouts almost doubled the number of relations extracted.

Revision as of 14:38, 23 November 2019


Hacking Wikipedia for Hyponymy Relation Acquisition
Authors
Asuka Sumida
Kentaro Torisawa
Publication date
2008
Links
Original

Hacking Wikipedia for Hyponymy Relation Acquisition - scientific work related to Wikipedia quality published in 2008, written by Asuka Sumida and Kentaro Torisawa.

Overview

This paper describes a method for extracting a large set of hyponymy relations from Wikipedia. The Wikipedia is much more consistently structured than generic HTML documents, and authors can extract a large number of hyponymy relations with simple methods. In this work, authors managed to extract more than 1.4 × 106 hyponymy relations with 75.3% precision from the Japanese version of the Wikipedia. To the best of knowledge, this is the largest machine-readable thesaurus for Japanese. The main contribution of this paper is a method for hyponymy acquisition from hierarchical layouts in Wikipedia. By using a machine learning technique and pattern matching, authors were able to extract more than 6.3 × 105 relations from hierarchical layouts in the Japanese Wikipedia, and their precision was 76.4%. The remaining hyponymy relations were acquired by existing methods for extracting relations from definition sentences and category pages. This means that extraction from the hierarchical layouts almost doubled the number of relations extracted.