Effectively Mining Wikipedia for Clustering Multilingual Documents

From Wikipedia Quality
Jump to: navigation, search


Effectively Mining Wikipedia for Clustering Multilingual Documents
Authors
N. Kiran Kumar
G. S. K. Santosh
Vasudeva Varma
Publication date
2011
DOI
10.1007/978-3-642-22327-3_32
Links
Original Preprint

Effectively Mining Wikipedia for Clustering Multilingual Documents - scientific work related to Wikipedia quality published in 2011, written by N. Kiran Kumar, G. S. K. Santosh and Vasudeva Varma.

Overview

This paper presents Multilingual Document Clustering (MDC) using Wikipedia on comparable corpora. Particularly, authors utilized the cross lingual links, category, outlinks, Infobox information present in Wikipedia to enrich the document representation. Authors have used Bisecting k-means algorithm for clustering multilingual documents based on the document similarities. Experiments are conducted based on the usage of English Wikipedia and Hindi Wikipedia. Authors have considered English and Hindi Datasets provided by FIRE'10 for Ad-hoc Cross-Lingual document retrieval task on Indian languages. No language specific tools are used, which makes the proposed approach easily extendable for other languages. The system is evaluated using F-score and Purity measures and the results obtained are encouraging.

Each document in the dataset was represented with a Keyword vector. Three additional vectors namely Category vector, Outlink vector and Infobox vector was obtained by adding semantic information from Wikipedia using Keyword vector.


Embed

Wikipedia Quality

N. Kiran, Kumar; G. S. K., Santosh; Vasudeva, Varma. (2011). "[[Effectively Mining Wikipedia for Clustering Multilingual Documents]]". Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-22327-3_32.

English Wikipedia

{{cite journal |last1=N. Kiran |first1=Kumar |last2=G. S. K. |first2=Santosh |last3=Vasudeva |first3=Varma |title=Effectively Mining Wikipedia for Clustering Multilingual Documents |date=2011 |doi=10.1007/978-3-642-22327-3_32 |url=https://wikipediaquality.com/wiki/Effectively_mining_wikipedia_for_clustering_multilingual_documents |journal=Springer, Berlin, Heidelberg}}

HTML

N. Kiran, Kumar; G. S. K., Santosh; Vasudeva, Varma. (2011). &quot;<a href="https://wikipediaquality.com/wiki/Effectively_mining_wikipedia_for_clustering_multilingual_documents">Effectively Mining Wikipedia for Clustering Multilingual Documents</a>&quot;. Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-22327-3_32.