Exploiting Wikipedia in Identifying Named Entities: a Language-Independent Approach

From Wikipedia Quality
Jump to: navigation, search


Exploiting Wikipedia in Identifying Named Entities: a Language-Independent Approach
Authors
Mahathi Bhagavatula
Santosh Gsk
Vasudeva Varma
Publication date
2012
Links
Original

Exploiting Wikipedia in Identifying Named Entities: a Language-Independent Approach - scientific work related to Wikipedia quality published in 2012, written by Mahathi Bhagavatula, Santosh Gsk and Vasudeva Varma.

Overview

This paper details the approach to identify Named Entities (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human intervention and no linguistic expertise. The main objective in this paper is to focus on Indian languages like Telugu, Hindi, Tamil, Marathi, etc., which are considered to be resourcepoor languages when compared to English. The inherent structure of Wikipedia was exploited in developing an ecient co-occurrence frequency based NE identication algorithm for Indian Languages. Authors describe the methods by which English Wikipedia data can be used to bootstrap the identication of NEs in other languages. On a dataset of 2,622 Marathi Wikipedia articles, with around 10,000 NEs manually tagged, an F-Measure of 81.25% was achieved by system without availing language expertise. Similarly, an F-measure of 80.42% was achieved on around 12,000 NEs tagged within 2,935 Hindi Wikipedia articles.

Embed

Wikipedia Quality

Bhagavatula, Mahathi; Gsk, Santosh; Varma, Vasudeva. (2012). "[[Exploiting Wikipedia in Identifying Named Entities: a Language-Independent Approach]]".

English Wikipedia

{{cite journal |last1=Bhagavatula |first1=Mahathi |last2=Gsk |first2=Santosh |last3=Varma |first3=Vasudeva |title=Exploiting Wikipedia in Identifying Named Entities: a Language-Independent Approach |date=2012 |url=https://wikipediaquality.com/wiki/Exploiting_Wikipedia_in_Identifying_Named_Entities:_a_Language-Independent_Approach}}

HTML

Bhagavatula, Mahathi; Gsk, Santosh; Varma, Vasudeva. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Exploiting_Wikipedia_in_Identifying_Named_Entities:_a_Language-Independent_Approach">Exploiting Wikipedia in Identifying Named Entities: a Language-Independent Approach</a>&quot;.