Identifying Document Topics Using the Wikipedia Category Network

From Wikipedia Quality
Revision as of 20:36, 28 October 2019 by Amelia (talk | contribs) (+ cat.)
Jump to: navigation, search


Identifying Document Topics Using the Wikipedia Category Network
Authors
Péter Schönhofen
Publication date
2006
DOI
10.1109/WI.2006.92
Links
Original Preprint

Identifying Document Topics Using the Wikipedia Category Network - scientific work related to Wikipedia quality published in 2006, written by Péter Schönhofen.

Overview

In the last few years the size and coverage of Wikipe- dia, a freely available on-line encyclopedia has reached the point where it can be utilized similar to an ontology or tax- onomy to identify the topics discussed in a document. In this paper authors will show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories sur- prisingly well. Authors test the reliability of method by pre- dicting categories ofWikipedia articles themselves based on their bodies, and by performing classification and cluster- ing on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts.

Embed

Wikipedia Quality

Schönhofen, Péter. (2006). "[[Identifying Document Topics Using the Wikipedia Category Network]]".DOI: 10.1109/WI.2006.92.

English Wikipedia

{{cite journal |last1=Schönhofen |first1=Péter |title=Identifying Document Topics Using the Wikipedia Category Network |date=2006 |doi=10.1109/WI.2006.92 |url=https://wikipediaquality.com/wiki/Identifying_Document_Topics_Using_the_Wikipedia_Category_Network}}

HTML

Schönhofen, Péter. (2006). &quot;<a href="https://wikipediaquality.com/wiki/Identifying_Document_Topics_Using_the_Wikipedia_Category_Network">Identifying Document Topics Using the Wikipedia Category Network</a>&quot;.DOI: 10.1109/WI.2006.92.