Difference between revisions of "A New Approach for Building Domain-Specific Corpus with Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(+ wikilinks)
 
(+ infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = A New Approach for Building Domain-Specific Corpus with Wikipedia
 +
| date = 2013
 +
| authors = [[Xin Ye Zhang]]<br />[[Xiu Li]]<br />[[Zhi Jian Ruan]]
 +
| doi = 10.4028/www.scientific.net/AMM.321-324.2319
 +
| link = https://www.scientific.net/AMM.321-324.2319
 +
}}
 
'''A New Approach for Building Domain-Specific Corpus with Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Xin Ye Zhang]], [[Xiu Li]] and [[Zhi Jian Ruan]].
 
'''A New Approach for Building Domain-Specific Corpus with Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Xin Ye Zhang]], [[Xiu Li]] and [[Zhi Jian Ruan]].
  
 
== Overview ==
 
== Overview ==
 
Domain-specific corpus can be used to build domain [[ontology]], which is used in many areas such as IR, NLP and web Mining. Authors propose a multi-root based method to build a domain-specific corpus making use of [[Wikipedia]] resources. First authors select some top-level nodes (Wikipedia category articles) as root nodes and traverse the Wikipedia using BFS-like algorithm. After the traverse, authors get a directed Wikipedia graph (Wiki-graph). Then an algorithm mainly based on Kosaraju Algorithm is proposed to remove the cycles in the Wiki-graph. Finally, topological sort algorithm is used to traverse the Wiki-graph, and ranking and filtering is done during the process. When computing a node’s ranking score, the in-degree of itself and the out-degree of its parents are both considered. The experimental evaluation shows that method could get a high-quality domain-specific corpus
 
Domain-specific corpus can be used to build domain [[ontology]], which is used in many areas such as IR, NLP and web Mining. Authors propose a multi-root based method to build a domain-specific corpus making use of [[Wikipedia]] resources. First authors select some top-level nodes (Wikipedia category articles) as root nodes and traverse the Wikipedia using BFS-like algorithm. After the traverse, authors get a directed Wikipedia graph (Wiki-graph). Then an algorithm mainly based on Kosaraju Algorithm is proposed to remove the cycles in the Wiki-graph. Finally, topological sort algorithm is used to traverse the Wiki-graph, and ranking and filtering is done during the process. When computing a node’s ranking score, the in-degree of itself and the out-degree of its parents are both considered. The experimental evaluation shows that method could get a high-quality domain-specific corpus

Revision as of 14:51, 7 December 2019


A New Approach for Building Domain-Specific Corpus with Wikipedia
Authors
Xin Ye Zhang
Xiu Li
Zhi Jian Ruan
Publication date
2013
DOI
10.4028/www.scientific.net/AMM.321-324.2319
Links
Original

A New Approach for Building Domain-Specific Corpus with Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Xin Ye Zhang, Xiu Li and Zhi Jian Ruan.

Overview

Domain-specific corpus can be used to build domain ontology, which is used in many areas such as IR, NLP and web Mining. Authors propose a multi-root based method to build a domain-specific corpus making use of Wikipedia resources. First authors select some top-level nodes (Wikipedia category articles) as root nodes and traverse the Wikipedia using BFS-like algorithm. After the traverse, authors get a directed Wikipedia graph (Wiki-graph). Then an algorithm mainly based on Kosaraju Algorithm is proposed to remove the cycles in the Wiki-graph. Finally, topological sort algorithm is used to traverse the Wiki-graph, and ranking and filtering is done during the process. When computing a node’s ranking score, the in-degree of itself and the out-degree of its parents are both considered. The experimental evaluation shows that method could get a high-quality domain-specific corpus