An English-Translated Parallel Corpus for the Cjk Wikipedia Collections
Authors | Ling-Xiang Tang Shlomo Geva Andrew Trotman |
---|---|
Publication date | 2012 |
DOI | 10.1145/2407085.2407099 |
Links | Original |
An English-Translated Parallel Corpus for the Cjk Wikipedia Collections - scientific work related to Wikipedia quality published in 2012, written by Ling-Xiang Tang, Shlomo Geva and Andrew Trotman.
Overview
In this paper, authors describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus . The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.