Finding Domain Terms using Wikipedia

Finding Domain Terms using Wikipedia
Authors	Jorge Vivaldi Horacio Rodríguez
Publication date	2010
Links	Original Preprint

Finding Domain Terms using Wikipedia - scientific work related to Wikipedia quality published in 2010, written by Jorge Vivaldi and Horacio Rodríguez.

Overview

In this paper authors present a new approach for obtaining the terminology of a given domain using the category and page structures of the Wikipedia in a language independent way. The idea is to take profit of category graph of Wikipedia starting with a top category that authors identify with the name of the domain. After obtaining the full set of categories belonging to the selected domain, the collection of corresponding pages is extracted, using some constraints. For reducing noise a bootstrapping approach implying several iterations is used. At each iteration less reliable pages, according to the balance between on-domain and off-domain categories of the page, are removed as well as less reliable categories. The set of recovered pages and categories is selected as initial domain term vocabulary. This approach has been applied to three broad coverage domains: astronomy, chemistry and medicine, and two languages: English and Spanish, showing a promising performance. The resulting set of terms has been evaluated using as reference those terms occurring in WordNet (using Magnini's domain codes) and those appearing in SNOMED-CT (a reference resource for the Medical domain available for Spanish).

Embed

Wikipedia Quality

Jorge, Vivaldi; Horacio, Rodríguez. (2010). "[[Finding Domain Terms using Wikipedia]]".

English Wikipedia

{{cite journal |last1=Jorge |first1=Vivaldi |last2=Horacio |first2=Rodríguez |title=Finding Domain Terms using Wikipedia |date=2010 |url=https://wikipediaquality.com/wiki/Finding_Domain_Terms_using_Wikipedia}}

HTML

Jorge, Vivaldi; Horacio, Rodríguez. (2010). "<a href="https://wikipediaquality.com/wiki/Finding_Domain_Terms_using_Wikipedia">Finding Domain Terms using Wikipedia</a>".

Finding Domain Terms using Wikipedia

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools