Difference between revisions of "Creating Indonesian-Javanese Parallel Corpora Using Wikipedia Articles"

From Wikipedia Quality
Jump to: navigation, search
(Adding embed)
(+ cat.)
 
Line 32: Line 32:
 
</nowiki>
 
</nowiki>
 
</code>
 
</code>
 +
 +
 +
 +
[[Category:Scientific works]]
 +
[[Category:Indonesian Wikipedia]]
 +
[[Category:Javanese Wikipedia]]

Latest revision as of 08:22, 13 February 2021


Creating Indonesian-Javanese Parallel Corpora Using Wikipedia Articles
Authors
Bayu Distiawan Trisedya
Dyah Inastra
Publication date
2014
DOI
10.1109/ICACSIS.2014.7065828
Links
Original

Creating Indonesian-Javanese Parallel Corpora Using Wikipedia Articles - scientific work related to Wikipedia quality published in 2014, written by Bayu Distiawan Trisedya and Dyah Inastra.

Overview

Parallel corpora are necessary for multilingual researches especially in information retrieval (IR) and natural language processing (NLP). However, such corpora are hard to find, specifically for low-resources languages like ethnic languages. Parallel corpora of ethnic languages were usually collected manually. On the other hand, Wikipedia as a free online encyclopedia is supporting more and more languages each year, including ethnic languages in Indonesia. It has become one of the largest multilingual sites in World Wide Web that provides free distributed articles. In this paper, authors explore a few sentence alignment methods which have been used before for another domain. Authors want to check whether Wikipedia can be used as one of the resources for collecting parallel corpora of Indonesian and Javanese, an ethnic language in Indonesia. Authors used two approaches of sentence alignment by treating Wikipedia as both parallel corpora and comparable corpora. In parallel corpora case, authors used sentence length based and word correspondence methods. Meanwhile, authors used the characteristics of hypertext links from Wikipedia in comparable corpora case. After the experiments, authors can see that Wikipedia is useful enough for purpose because both approaches gave positive results.

Embed

Wikipedia Quality

Trisedya, Bayu Distiawan; Inastra, Dyah. (2014). "[[Creating Indonesian-Javanese Parallel Corpora Using Wikipedia Articles]]".DOI: 10.1109/ICACSIS.2014.7065828.

English Wikipedia

{{cite journal |last1=Trisedya |first1=Bayu Distiawan |last2=Inastra |first2=Dyah |title=Creating Indonesian-Javanese Parallel Corpora Using Wikipedia Articles |date=2014 |doi=10.1109/ICACSIS.2014.7065828 |url=https://wikipediaquality.com/wiki/Creating_Indonesian-Javanese_Parallel_Corpora_Using_Wikipedia_Articles}}

HTML

Trisedya, Bayu Distiawan; Inastra, Dyah. (2014). &quot;<a href="https://wikipediaquality.com/wiki/Creating_Indonesian-Javanese_Parallel_Corpora_Using_Wikipedia_Articles">Creating Indonesian-Javanese Parallel Corpora Using Wikipedia Articles</a>&quot;.DOI: 10.1109/ICACSIS.2014.7065828.