Difference between revisions of "Wikipedia in Action: Ontological Knowledge in Text Categorization"

From Wikipedia Quality
Jump to: navigation, search
(Creating a page: Wikipedia in Action: Ontological Knowledge in Text Categorization)
 
(Wikilinks)
Line 1: Line 1:
'''Wikipedia in Action: Ontological Knowledge in Text Categorization''' - scientific work related to Wikipedia quality published in 2008, written by Maciej Janik and Krys J. Kochut.
+
'''Wikipedia in Action: Ontological Knowledge in Text Categorization''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Maciej Janik]] and [[Krys J. Kochut]].
  
 
== Overview ==
 
== Overview ==
Authors present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Authors method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, authors used an RDF ontology constructed from the full English version of Wikipedia. Authors experiments, conducted on corpora of Reuters news articles, showed that training-less categorization method achieved a very good overall accuracy.
+
Authors present a new, [[ontology]]-based approach to the automatic text categorization. An important and novel aspect of this approach is that categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of [[categories]] and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Authors method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, authors used an RDF ontology constructed from the full English version of [[Wikipedia]]. Authors experiments, conducted on corpora of Reuters news articles, showed that training-less categorization method achieved a very good overall accuracy.

Revision as of 09:34, 18 May 2020

Wikipedia in Action: Ontological Knowledge in Text Categorization - scientific work related to Wikipedia quality published in 2008, written by Maciej Janik and Krys J. Kochut.

Overview

Authors present a new, ontology-based approach to the automatic text categorization. An important and novel aspect of this approach is that categorization method does not require a training set, which is in contrast to the traditional statistical and probabilistic methods. In the presented method, the ontology, including the domain concepts organized into hierarchies of categories and interconnected by relationships, as well as instances and connections among them, effectively becomes the classifier. Authors method focuses on (i) converting a text document into a thematic graph of entities occurring in the document, (ii) ontological classification of the entities in the graph, and (iii) determining the overall categorization of the thematic graph, and as a result, the document itself. In the presented experiments, authors used an RDF ontology constructed from the full English version of Wikipedia. Authors experiments, conducted on corpora of Reuters news articles, showed that training-less categorization method achieved a very good overall accuracy.