Arabic Text Categorization based on Arabic Wikipedia

From Wikipedia Quality
Revision as of 10:48, 18 December 2019 by Isabelle (talk | contribs) (Adding wikilinks)
Jump to: navigation, search

Arabic Text Categorization based on Arabic Wikipedia - scientific work related to Wikipedia quality published in 2014, written by Adnan H. Yahya and Ali Salhi.

Overview

This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea then moving forward to more complex ones. Authors applied tests and filtration criteria to reach the best and most efficient results that algorithm can achieve. The categorization depends on the statistical relations between the input (test) text and the reference (training) data supported by well-defined Wikipedia-based categories. Authors algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. Authors argue that algorithm achieved good performance compared to other methods reported in the literature.