Difference between revisions of "Derivation of is a Taxonomy from Wikipedia Category Graph"

From Wikipedia Quality
Jump to: navigation, search
(Starting an article - Derivation of is a Taxonomy from Wikipedia Category Graph)
 
(wikilinks)
Line 1: Line 1:
'''Derivation of is a Taxonomy from Wikipedia Category Graph''' - scientific work related to Wikipedia quality published in 2016, written by Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb and Malek Ezzeddine.
+
'''Derivation of is a Taxonomy from Wikipedia Category Graph''' - scientific work related to [[Wikipedia quality]] published in 2016, written by [[Mohamed Ben Aouicha]], [[Mohamed Ali Hadj Taieb]] and [[Malek Ezzeddine]].
  
 
== Overview ==
 
== Overview ==
Knowledge acquisition still represents one of the main challenging obstacles to designing intelligent systems exhibiting human-level performance in complex intelligent tasks. The recent developments in crowdsourcing technologies have opened new promising opportunities to overcome this problem by exploiting large amounts of machine readable knowledge to perform tasks requiring human intelligence. Wikipedia is a case of this research trend, being the largest collaborative and multilingual resource and linguistic knowledge that contains unstructured and semi-structured information. In this paper, authors propose an approach for deriving "is a" taxonomy from the Wikipedia Categories Graph (WCG), which is an open collaborative resource. After building and filtering the WCG from a Wikipedia dump, the process would mainly consist in the exploitation of the "BY" tag and the sharing of plural headers. These methods provide a graph formed by a set of non-connected sub-graphs. Therefore, authors propose a process for linking them to finally obtain an "is a" taxonomy with only one root and modeled as a direct acyclic graph (DAG). In this work, specific DAG handling algorithms are used, including an algorithm for a DAG into sub-DAGs and another for merging two DAGs. The obtained taxonomy is assessed using semantic similarity measures, which consist in quantifying the likeness between two concepts or words. Therefore, authors exploit a set of well-known benchmarks to compare the results obtained via the generated taxonomy to those achieved with WordNet, a resource created and maintained by domain experts. The experimental results revealed good correlations between computed values and human judgments. Compared to WordNet, the derived taxonomy was also noted to lead to an enhanced coverage capacity.
+
Knowledge acquisition still represents one of the main challenging obstacles to designing intelligent systems exhibiting human-level performance in complex intelligent tasks. The recent developments in crowdsourcing technologies have opened new promising opportunities to overcome this problem by exploiting large amounts of machine readable knowledge to perform tasks requiring human intelligence. [[Wikipedia]] is a case of this research trend, being the largest collaborative and [[multilingual]] resource and linguistic knowledge that contains unstructured and semi-[[structured information]]. In this paper, authors propose an approach for deriving "is a" taxonomy from the Wikipedia Categories Graph (WCG), which is an open collaborative resource. After building and filtering the WCG from a Wikipedia dump, the process would mainly consist in the exploitation of the "BY" tag and the sharing of plural headers. These methods provide a graph formed by a set of non-connected sub-graphs. Therefore, authors propose a process for linking them to finally obtain an "is a" taxonomy with only one root and modeled as a direct acyclic graph (DAG). In this work, specific DAG handling algorithms are used, including an algorithm for a DAG into sub-DAGs and another for merging two DAGs. The obtained taxonomy is assessed using [[semantic similarity]] [[measures]], which consist in quantifying the likeness between two concepts or words. Therefore, authors exploit a set of well-known benchmarks to compare the results obtained via the generated taxonomy to those achieved with [[WordNet]], a resource created and maintained by domain experts. The experimental results revealed good correlations between computed values and human judgments. Compared to WordNet, the derived taxonomy was also noted to lead to an enhanced coverage capacity.

Revision as of 11:10, 23 January 2020

Derivation of is a Taxonomy from Wikipedia Category Graph - scientific work related to Wikipedia quality published in 2016, written by Mohamed Ben Aouicha, Mohamed Ali Hadj Taieb and Malek Ezzeddine.

Overview

Knowledge acquisition still represents one of the main challenging obstacles to designing intelligent systems exhibiting human-level performance in complex intelligent tasks. The recent developments in crowdsourcing technologies have opened new promising opportunities to overcome this problem by exploiting large amounts of machine readable knowledge to perform tasks requiring human intelligence. Wikipedia is a case of this research trend, being the largest collaborative and multilingual resource and linguistic knowledge that contains unstructured and semi-structured information. In this paper, authors propose an approach for deriving "is a" taxonomy from the Wikipedia Categories Graph (WCG), which is an open collaborative resource. After building and filtering the WCG from a Wikipedia dump, the process would mainly consist in the exploitation of the "BY" tag and the sharing of plural headers. These methods provide a graph formed by a set of non-connected sub-graphs. Therefore, authors propose a process for linking them to finally obtain an "is a" taxonomy with only one root and modeled as a direct acyclic graph (DAG). In this work, specific DAG handling algorithms are used, including an algorithm for a DAG into sub-DAGs and another for merging two DAGs. The obtained taxonomy is assessed using semantic similarity measures, which consist in quantifying the likeness between two concepts or words. Therefore, authors exploit a set of well-known benchmarks to compare the results obtained via the generated taxonomy to those achieved with WordNet, a resource created and maintained by domain experts. The experimental results revealed good correlations between computed values and human judgments. Compared to WordNet, the derived taxonomy was also noted to lead to an enhanced coverage capacity.