Mining Translation Pairs with Learnt Patterns from Wikipedia

From Wikipedia Quality
Revision as of 09:42, 5 June 2019 by Abigail (talk | contribs) (Links)
Jump to: navigation, search

Mining Translation Pairs with Learnt Patterns from Wikipedia - scientific work related to Wikipedia quality published in 2015, written by Duan Jianyon.

Overview

Bilingual translation pairs play an import role in many NLP applications,such as cross language information retrieval and machine translation.The translation of proper names,out of vocabulary words,idioms and technical terminologies is one of the key factors that affect the performance of the systems.However,these translations can hardly be found in the traditional bilingual dictionary.This paper proposes a new method to automatically extract high quality translation pairs from Wikipedia based on the wide area coverage and data structure,the method not only can learn common patterns,but also learn many patterns that can hardly be found by human beings.The method contains three steps:1)extract translation pairs from the language toolbox of the Wikipedia.They can be heuristic for the next step;2)learn patterns of translation pairs with the knowledge of PAT-Array gained from the previous work;3)extract other translation pairs automatically using the learned patterns.Authors experimental results show the accuracy can reach 90.4%.