Difference between revisions of "Galateas D2W: a Multi-Lingual Disambiguation to Wikipedia Web Service"

From Wikipedia Quality
Jump to: navigation, search
(Overview - Galateas D2W: a Multi-Lingual Disambiguation to Wikipedia Web Service)
 
(+ links)
Line 1: Line 1:
'''Galateas D2W: a Multi-Lingual Disambiguation to Wikipedia Web Service''' - scientific work related to Wikipedia quality published in 2013, written by Deirdre Lungley, Marco Trevisan, Vien Nguyen, Maha Althobaiti and Massimo Poesio.
+
'''Galateas D2W: a Multi-Lingual Disambiguation to Wikipedia Web Service''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Deirdre Lungley]], [[Marco Trevisan]], [[Vien Nguyen]], [[Maha Althobaiti]] and [[Massimo Poesio]].
  
 
== Overview ==
 
== Overview ==
ABSTRACT The motivation for entity extraction within a digital culturalcollection is the enrichment potential of such a tool – usefulin this context for such tasks as metadata generation andquery log analysis. The use of Disambiguation to Wikipediaas particular entity extraction tool is motivated by itsgeneralisable nature and its suitability to noisy text. Theparticular methodolgy authors use does not avail of specific nat-ural language tools and therefore can be applied to otherlanguages with minimal adaptation. This has allowed us todevelop a multi-lingual Disambiguation to Wikipedia toolwhich authors have deployed as a web service for the use of thecommunity. Categories and Subject Descriptors I.2 [Artificial Intelligence]: Natural Language Process-ing—Text analysis General Terms Algorithms,Languages, Experimentation Keywords Disambiguation to Wikipedia, Entity recognition 1. INTRODUCTION Information Retrieval within the digital cultural heritagecontext must contend with often inately “noisy” resources:poor spelling and punctuation, obsolete word forms and ab-breviated forms. This can be the case in both text-basedresources and in the metadata of image-based resources.This provides an often unsurmountable challenge to tradi-tional natural language processing techniques, e.g., tradi-tional Named Entity Recognition (ner). However, the en-
+
ABSTRACT The motivation for entity extraction within a digital culturalcollection is the enrichment potential of such a tool – usefulin this context for such tasks as metadata generation andquery log analysis. The use of Disambiguation to [[Wikipedia]]as particular entity extraction tool is motivated by itsgeneralisable nature and its suitability to noisy text. Theparticular methodolgy authors use does not avail of specific nat-ural language tools and therefore can be applied to otherlanguages with minimal adaptation. This has allowed us todevelop a multi-lingual Disambiguation to Wikipedia toolwhich authors have deployed as a web service for the use of thecommunity. Categories and Subject Descriptors I.2 [Artificial Intelligence]: Natural Language Process-ing—Text analysis General Terms Algorithms,Languages, Experimentation Keywords Disambiguation to Wikipedia, Entity recognition 1. INTRODUCTION Information Retrieval within the digital cultural heritagecontext must contend with often inately “noisy” resources:poor spelling and punctuation, obsolete word forms and ab-breviated forms. This can be the case in both text-basedresources and in the metadata of image-based resources.This provides an often unsurmountable challenge to tradi-tional [[natural language processing]] techniques, e.g., tradi-tional Named Entity Recognition (ner). However, the en-

Revision as of 06:54, 1 July 2019

Galateas D2W: a Multi-Lingual Disambiguation to Wikipedia Web Service - scientific work related to Wikipedia quality published in 2013, written by Deirdre Lungley, Marco Trevisan, Vien Nguyen, Maha Althobaiti and Massimo Poesio.

Overview

ABSTRACT The motivation for entity extraction within a digital culturalcollection is the enrichment potential of such a tool – usefulin this context for such tasks as metadata generation andquery log analysis. The use of Disambiguation to Wikipediaas particular entity extraction tool is motivated by itsgeneralisable nature and its suitability to noisy text. Theparticular methodolgy authors use does not avail of specific nat-ural language tools and therefore can be applied to otherlanguages with minimal adaptation. This has allowed us todevelop a multi-lingual Disambiguation to Wikipedia toolwhich authors have deployed as a web service for the use of thecommunity. Categories and Subject Descriptors I.2 [Artificial Intelligence]: Natural Language Process-ing—Text analysis General Terms Algorithms,Languages, Experimentation Keywords Disambiguation to Wikipedia, Entity recognition 1. INTRODUCTION Information Retrieval within the digital cultural heritagecontext must contend with often inately “noisy” resources:poor spelling and punctuation, obsolete word forms and ab-breviated forms. This can be the case in both text-basedresources and in the metadata of image-based resources.This provides an often unsurmountable challenge to tradi-tional natural language processing techniques, e.g., tradi-tional Named Entity Recognition (ner). However, the en-