Difference between revisions of "Building Indonesian Local Language Detection Tools Using Wikipedia Data"

Revision as of 22:05, 14 July 2019

Building Indonesian Local Language Detection Tools Using Wikipedia Data - scientific work related to Wikipedia quality published in 2015, written by Puji Martadinata, Bayu Distiawan Trisedya, Hisar Maruli Manurung and Mirna Adriani.

Overview

The widespread use of social media today has generated lots of research interest towards information retrieval, natural language processing, and also machine learning. The vast diversity of languages used on social media creates the need for accurate automated language identification tools. In this research, authors develop a language identification tool that can help automatically identify social media posts in Indonesian, Javanese, Sundanese, and Minangkabau. The latter three are some of the most widely spoken regional languages in Indonesia. Authors conducted experiments to compare three popular methods used to develop language identification tools, namely N-grams, statistical models, and the Small Words technique. Authors experiments conducted using articles on internet for training and tested using social media data that authors constructed, show that the statistical method obtains the best result among all the methods used.

@@ Line 1: / Line 1: @@
-'''Building Indonesian Local Language Detection Tools Using Wikipedia Data''' - scientific work related to Wikipedia quality published in 2015, written by Puji Martadinata, Bayu Distiawan Trisedya, Hisar Maruli Manurung and Mirna Adriani.
+'''Building Indonesian Local Language Detection Tools Using Wikipedia Data''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Puji Martadinata]], [[Bayu Distiawan Trisedya]], [[Hisar Maruli Manurung]] and [[Mirna Adriani]].
 == Overview ==
-The widespread use of social media today has generated lots of research interest towards information retrieval, natural language processing, and also machine learning. The vast diversity of languages used on social media creates the need for accurate automated language identification tools. In this research, authors develop a language identification tool that can help automatically identify social media posts in Indonesian, Javanese, Sundanese, and Minangkabau. The latter three are some of the most widely spoken regional languages in Indonesia. Authors conducted experiments to compare three popular methods used to develop language identification tools, namely N-grams, statistical models, and the Small Words technique. Authors experiments conducted using articles on internet for training and tested using social media data that authors constructed, show that the statistical method obtains the best result among all the methods used.
+The widespread use of social media today has generated lots of research interest towards [[information retrieval]], [[natural language processing]], and also machine learning. The vast diversity of languages used on social media creates the need for accurate automated language identification tools. In this research, authors develop a language identification tool that can help automatically identify social media posts in Indonesian, Javanese, Sundanese, and Minangkabau. The latter three are some of the most widely spoken regional languages in Indonesia. Authors conducted experiments to compare three popular methods used to develop language identification tools, namely N-grams, statistical models, and the Small Words technique. Authors experiments conducted using articles on internet for training and tested using social media data that authors constructed, show that the statistical method obtains the best result among all the methods used.

Difference between revisions of "Building Indonesian Local Language Detection Tools Using Wikipedia Data"

Revision as of 22:05, 14 July 2019

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools