Difference between revisions of "An Open-Source Toolkit for Mining Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Int.links)
(infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = An Open-Source Toolkit for Mining Wikipedia
 +
| date = 2013
 +
| authors = [[David N. Milne]]<br />[[Ian H. Witten]]
 +
| doi = 10.1016/j.artint.2012.06.007
 +
| link = https://dl.acm.org/citation.cfm?id=2405918
 +
| plink = https://www.semanticscholar.org/paper/An-open-source-toolkit-for-mining-Wikipedia-Milne-Witten/435ffbc4c2b580b92ccb9e2bf621941ef5fcfde7/figure/12
 +
}}
 
'''An Open-Source Toolkit for Mining Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[David N. Milne]] and [[Ian H. Witten]].
 
'''An Open-Source Toolkit for Mining Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[David N. Milne]] and [[Ian H. Witten]].
  
 
== Overview ==
 
== Overview ==
 
The online encyclopedia [[Wikipedia]] is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant [[multilingual]] database of concepts and semantic relations, a potential resource for [[natural language processing]] and many other research areas. This paper introduces the Wikipedia Miner toolkit, an [[open-source]] software system that allows researchers and developers to integrate Wikipedia@?s rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia@?s content and structure, and includes a Java API to provide access to them. Wikipedia@?s articles, [[categories]] and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced [[features]] include parallelized processing of Wikipedia dumps, machine-learned semantic [[relatedness]] [[measures]] and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.
 
The online encyclopedia [[Wikipedia]] is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant [[multilingual]] database of concepts and semantic relations, a potential resource for [[natural language processing]] and many other research areas. This paper introduces the Wikipedia Miner toolkit, an [[open-source]] software system that allows researchers and developers to integrate Wikipedia@?s rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia@?s content and structure, and includes a Java API to provide access to them. Wikipedia@?s articles, [[categories]] and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced [[features]] include parallelized processing of Wikipedia dumps, machine-learned semantic [[relatedness]] [[measures]] and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.

Revision as of 07:16, 5 October 2019


An Open-Source Toolkit for Mining Wikipedia
Authors
David N. Milne
Ian H. Witten
Publication date
2013
DOI
10.1016/j.artint.2012.06.007
Links
Original Preprint

An Open-Source Toolkit for Mining Wikipedia - scientific work related to Wikipedia quality published in 2013, written by David N. Milne and Ian H. Witten.

Overview

The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia@?s rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia@?s content and structure, and includes a Java API to provide access to them. Wikipedia@?s articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques.