Difference between revisions of "Language Independent Identification of Parallel Sentences Using Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Basic information on Language Independent Identification of Parallel Sentences Using Wikipedia)
 
(+ wikilinks)
Line 1: Line 1:
'''Language Independent Identification of Parallel Sentences Using Wikipedia''' - scientific work related to Wikipedia quality published in 2011, written by Rohit G. Bharadwaj and Vasudeva Varma.
+
'''Language Independent Identification of Parallel Sentences Using Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Rohit G. Bharadwaj]] and [[Vasudeva Varma]].
  
 
== Overview ==
 
== Overview ==
This paper details a novel classification based approach to identify parallel sentences between two languages in a language independent way. Authors substitute the required language specific resources by the richly structured multilingual content, Wikipedia. Authors approach is particularly useful to extract parallel sentences for under-resourced languages like most Indian and African languages, where resources are not readily available with necessary accuracies. Authors extract various statistics based on the cross lingual links present in Wikipedia and use them to generate feature vectors for each sentence pair. Binary classification of each pair of sentences into parallel or non-parallel has been done using these feature vectors. Authors achieved a precision upto 78% which is encouraging when compared to other state-of-art approaches.These results support hypothesis of using Wikipedia to evaluate the parallel coefficient between sentences that can be used to build bilingual dictionaries.
+
This paper details a novel classification based approach to identify parallel sentences between two languages in a language independent way. Authors substitute the required language specific resources by the richly structured [[multilingual]] content, [[Wikipedia]]. Authors approach is particularly useful to extract parallel sentences for under-resourced languages like most Indian and African languages, where resources are not readily available with necessary accuracies. Authors extract various statistics based on the [[cross lingual]] links present in Wikipedia and use them to generate feature vectors for each sentence pair. Binary classification of each pair of sentences into parallel or non-parallel has been done using these feature vectors. Authors achieved a precision upto 78% which is encouraging when compared to other state-of-art approaches.These results support hypothesis of using Wikipedia to evaluate the parallel coefficient between sentences that can be used to build bilingual dictionaries.

Revision as of 13:11, 2 November 2020

Language Independent Identification of Parallel Sentences Using Wikipedia - scientific work related to Wikipedia quality published in 2011, written by Rohit G. Bharadwaj and Vasudeva Varma.

Overview

This paper details a novel classification based approach to identify parallel sentences between two languages in a language independent way. Authors substitute the required language specific resources by the richly structured multilingual content, Wikipedia. Authors approach is particularly useful to extract parallel sentences for under-resourced languages like most Indian and African languages, where resources are not readily available with necessary accuracies. Authors extract various statistics based on the cross lingual links present in Wikipedia and use them to generate feature vectors for each sentence pair. Binary classification of each pair of sentences into parallel or non-parallel has been done using these feature vectors. Authors achieved a precision upto 78% which is encouraging when compared to other state-of-art approaches.These results support hypothesis of using Wikipedia to evaluate the parallel coefficient between sentences that can be used to build bilingual dictionaries.