An English-Translated Parallel Corpus for the Cjk Wikipedia Collections

From Wikipedia Quality
Revision as of 00:47, 9 June 2019 by Liliana (talk | contribs) (An English-Translated Parallel Corpus for the Cjk Wikipedia Collections - creating a new article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

An English-Translated Parallel Corpus for the Cjk Wikipedia Collections - scientific work related to Wikipedia quality published in 2012, written by Ling-Xiang Tang, Shlomo Geva and Andrew Trotman.

Overview

In this paper, authors describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus . The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.