Difference between revisions of "Improving the Wikipedia Miner Word Sense Disambiguation Algorithm"

From Wikipedia Quality
Jump to: navigation, search
(Overview - Improving the Wikipedia Miner Word Sense Disambiguation Algorithm)
 
(Links)
Line 1: Line 1:
'''Improving the Wikipedia Miner Word Sense Disambiguation Algorithm''' - scientific work related to Wikipedia quality published in 2012, written by Aleksander Pohl.
+
'''Improving the Wikipedia Miner Word Sense Disambiguation Algorithm''' - scientific work related to [[Wikipedia quality]] published in 2012, written by [[Aleksander Pohl]].
  
 
== Overview ==
 
== Overview ==
This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disambiguation algorithm was improved by 8 percentage points (F 1 -measure), without impeding its performance nor introducing any additional preprocessing overhead. This document also presents some statistical data that are extracted from the Polish Wikipedia by Wikipedia Miner. An automatic evaluation of the performance of the disambiguation algorithm for Polish shows that it is almost as good as for English, even though the Polish Wikipedia has only a quarter of the number of the articles of the English Wikipedia.
+
This document describes the improvements of the [[Wikipedia]] Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized [[Google]] Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional [[features]], the disambiguation algorithm was improved by 8 percentage points (F 1 -measure), without impeding its performance nor introducing any additional preprocessing overhead. This document also presents some statistical data that are extracted from the Polish Wikipedia by Wikipedia Miner. An automatic evaluation of the performance of the disambiguation algorithm for Polish shows that it is almost as good as for English, even though the Polish Wikipedia has only a quarter of the number of the articles of the [[English Wikipedia]].

Revision as of 10:11, 26 November 2019

Improving the Wikipedia Miner Word Sense Disambiguation Algorithm - scientific work related to Wikipedia quality published in 2012, written by Aleksander Pohl.

Overview

This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disambiguation algorithm was improved by 8 percentage points (F 1 -measure), without impeding its performance nor introducing any additional preprocessing overhead. This document also presents some statistical data that are extracted from the Polish Wikipedia by Wikipedia Miner. An automatic evaluation of the performance of the disambiguation algorithm for Polish shows that it is almost as good as for English, even though the Polish Wikipedia has only a quarter of the number of the articles of the English Wikipedia.