Lwcr: Multi-Layered Wikipedia Representation for Computing Word Relatedness

From Wikipedia Quality
Revision as of 09:31, 14 November 2019 by Leah (talk | contribs) (Embed for English Wikipedia, HTML)
Jump to: navigation, search


Lwcr: Multi-Layered Wikipedia Representation for Computing Word Relatedness
Authors
Mohamed Ben Aouicha
Mohamed Ali Hadj Taieb
Abdelmajid Ben Hamadou
Publication date
2016
DOI
10.1016/j.neucom.2016.08.045
Links
Original

Lwcr: Multi-Layered Wikipedia Representation for Computing Word Relatedness - scientific work related to Wikipedia quality published in 2016, written by Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb and Abdelmajid Ben Hamadou.

Overview

Abstract The measurement of the semantic relatedness between words has gained increasing interest in several research fields, including cognitive science, artificial intelligence, biology, and linguistics. The development of efficient measures is based on knowledge resources, such as Wikipedia, a huge and living encyclopedia supplied by net surfers. In this paper, authors propose a novel approach based on multi-Layered Wikipedia representation for Computing word Relatedness (LWCR) exploiting a weighting scheme based on Wikipedia Category Graph (WCG): Term Frequency-Inverse Category Frequency ( tf x icf ). Authors proposal provides for each category pertaining to the WCG a Category Description Vector (CDV) including the weights of stems extracted from articles assigned to a category. The semantic relatedness degree is computed using the cosine measure between the CDVs assigned to the target words couple. The basic idea is followed by enhancement modules exploiting other Wikipedia features, such as article titles, redirection mechanism, and neighborhood category enrichment, to exploit semantic features and better quantify the semantic relatedness between words. To the best of knowledge, this is the first attempt to incorporate the WCG-based term-weighting scheme ( tf x icf ) into computing model of semantic relatedness. It is also the first work that exploits 17 datasets in the assessment process, which are divided into two sets. The first set includes the ones designed for semantic similarity purposes: RG65, MC30, AG203, WP300, SimLexNoun666 and GeReSiD50Sim; the second includes datasets for semantic relatedness evaluation: WordSim353, GM30, Zeigler25, Zeigler30, MTurk287, MTurk771, MEN3000, Rel122, ReWord26, GeReSiD50 and SCWS1229. The found results are compared to WordNet-based measures and distributional measures cosine and PMI performed on Wikipedia articles. Experiments show that approach provides consistent improvements over the state of the art results on multiple benchmarks.

Embed

Wikipedia Quality

Aouicha, Mohamed Ben; Taieb, Mohamed Ali Hadj; Hamadou, Abdelmajid Ben. (2016). "[[Lwcr: Multi-Layered Wikipedia Representation for Computing Word Relatedness]]". Elsevier. DOI: 10.1016/j.neucom.2016.08.045.

English Wikipedia

{{cite journal |last1=Aouicha |first1=Mohamed Ben |last2=Taieb |first2=Mohamed Ali Hadj |last3=Hamadou |first3=Abdelmajid Ben |title=Lwcr: Multi-Layered Wikipedia Representation for Computing Word Relatedness |date=2016 |doi=10.1016/j.neucom.2016.08.045 |url=https://wikipediaquality.com/wiki/Lwcr:_Multi-Layered_Wikipedia_Representation_for_Computing_Word_Relatedness |journal=Elsevier}}

HTML

Aouicha, Mohamed Ben; Taieb, Mohamed Ali Hadj; Hamadou, Abdelmajid Ben. (2016). &quot;<a href="https://wikipediaquality.com/wiki/Lwcr:_Multi-Layered_Wikipedia_Representation_for_Computing_Word_Relatedness">Lwcr: Multi-Layered Wikipedia Representation for Computing Word Relatedness</a>&quot;. Elsevier. DOI: 10.1016/j.neucom.2016.08.045.