Difference between revisions of "Identifying Document Topics Using the Wikipedia Category Network"

From Wikipedia Quality
Jump to: navigation, search
(Identifying Document Topics Using the Wikipedia Category Network -- new article)
 
(+ links)
Line 1: Line 1:
'''Identifying Document Topics Using the Wikipedia Category Network''' - scientific work related to Wikipedia quality published in 2006, written by Péter Schönhofen.
+
'''Identifying Document Topics Using the Wikipedia Category Network''' - scientific work related to [[Wikipedia quality]] published in 2006, written by [[Péter Schönhofen]].
  
 
== Overview ==
 
== Overview ==
In the last few years the size and coverage of Wikipe- dia, a freely available on-line encyclopedia has reached the point where it can be utilized similar to an ontology or tax- onomy to identify the topics discussed in a document. In this paper authors will show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories sur- prisingly well. Authors test the reliability of method by pre- dicting categories ofWikipedia articles themselves based on their bodies, and by performing classification and cluster- ing on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts.
+
In the last few years the size and coverage of Wikipe- dia, a freely available on-line encyclopedia has reached the point where it can be utilized similar to an [[ontology]] or tax- onomy to identify the topics discussed in a document. In this paper authors will show that even a simple algorithm that exploits only the titles and [[categories]] of [[Wikipedia]] articles can characterize documents by [[Wikipedia categories]] sur- prisingly well. Authors test the [[reliability]] of method by pre- dicting categories ofWikipedia articles themselves based on their bodies, and by performing classification and cluster- ing on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts.

Revision as of 08:46, 1 June 2019

Identifying Document Topics Using the Wikipedia Category Network - scientific work related to Wikipedia quality published in 2006, written by Péter Schönhofen.

Overview

In the last few years the size and coverage of Wikipe- dia, a freely available on-line encyclopedia has reached the point where it can be utilized similar to an ontology or tax- onomy to identify the topics discussed in a document. In this paper authors will show that even a simple algorithm that exploits only the titles and categories of Wikipedia articles can characterize documents by Wikipedia categories sur- prisingly well. Authors test the reliability of method by pre- dicting categories ofWikipedia articles themselves based on their bodies, and by performing classification and cluster- ing on 20 Newsgroups and RCV1, representing documents by their Wikipedia categories instead of their texts.