Computing Semantic Relatedness Using Wikipedia Features

From Wikipedia Quality
Revision as of 08:07, 14 January 2021 by Barbara (talk | contribs) (+ wikilinks)
Jump to: navigation, search

Computing Semantic Relatedness Using Wikipedia Features - scientific work related to Wikipedia quality published in 2013, written by Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha and Abdelmajid Ben Hamadou.

Overview

Measuring semantic relatedness is a critical task in many domains such as psychology, biology, linguistics, cognitive science and artificial intelligence. In this paper, authors propose a novel system for computing semantic relatedness between words. Recent approaches have exploited Wikipedia as a huge semantic resource that showed good performances. Therefore, authors utilized the Wikipedia features (articles, categories, Wikipedia category graph and redirection) in a system combining this Wikipedia semantic information in its different components. The approach is preceded by a pre-processing step to provide for each category pertaining to the Wikipedia category graph a semantic description vector including the weights of stems extracted from articles assigned to the target category. Next, for each candidate word, authors collect its categories set using an algorithm for categories extraction from the Wikipedia category graph. Then, authors compute the semantic relatedness degree using existing vector similarity metrics (Dice, Overlap and Cosine) and a new proposed metric that performed well as cosine formula. The basic system is followed by a set of modules in order to exploit Wikipedia features to quantify better as possible the semantic relatedness between words. Authors evaluate measure based on two tasks: comparison with human judgments using five datasets and a specific application solving choice problem. Authors result system shows a good performance and outperforms sometimes ESA (Explicit Semantic Analysis) and TSA (Temporal Semantic Analysis) approaches.