Comparative Analysis of Text Representation Methods Using Classification

From Wikipedia Quality
Jump to: navigation, search
Comparative Analysis of Text Representation Methods Using Classification
Authors
Julian Szymański
Publication date
2014
ISSN
01969722
DOI
10.1080/01969722.2014.874828
Links

Comparative Analysis of Text Representation Methods Using Classification - scientific work about Wikipedia quality published in 2014, written by Julian Szymański.

Overview

In their work, authors review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article - evaluation of approaches to text representation for machine learning tasks - indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot be compensated for even by sophisticated machine learning algorithms. It confirms the thesis that proper data representation is a prerequisite for achieving high-quality results of data analysis. Evaluation of the text representations was performed within the Wikipedia repository by examination of classification parameters observed during automatic reconstruction of human-made categories. For that purpose, authors use a classifier based on a support vector machines method, extended with multilabel and multiclass functionalities. During classifier construction authors observed parameters such as learning time, representation size, and classification quality that allow us to draw conclusions about text representations. For the experiments presented in the article, authors use data sets created from Wikipedia dumps. Authors describe their software, called Matrixu, which allows a user to build computational representations of Wikipedia articles. The software is the second contribution of their research, because it is a universal tool for converting Wikipedia from a human-readable form to a form that can be processed by a machine. Results generated using Matrixu can be used in a wide range of applications that involve usage of Wikipedia data.

Embed

Wikipedia Quality

Szymański, Julian. (2014). "[[Comparative Analysis of Text Representation Methods Using Classification]]". Cybernetics and Systems Volume 45, Issue 2, 17 February 2014, pp. 180-199. ISSN: 01969722. DOI: 10.1080/01969722.2014.874828.

English Wikipedia

{{cite journal |last1=Szymański |first1=Julian |title=Comparative Analysis of Text Representation Methods Using Classification |date=2014 |issn=01969722 |doi=10.1080/01969722.2014.874828 |url=https://wikipediaquality.com/wiki/Comparative_Analysis_of_Text_Representation_Methods_Using_Classification |journal=Cybernetics and Systems Volume 45, Issue 2, 17 February 2014, pp. 180-199}}

HTML

Szymański, Julian. (2014). &quot;<a href="https://wikipediaquality.com/wiki/Comparative_Analysis_of_Text_Representation_Methods_Using_Classification">Comparative Analysis of Text Representation Methods Using Classification</a>&quot;. Cybernetics and Systems Volume 45, Issue 2, 17 February 2014, pp. 180-199. ISSN: 01969722. DOI: 10.1080/01969722.2014.874828.