Difference between revisions of "Semantic Similarity Measurements for Multi-Lingual Short Texts Using Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Basic information on Semantic Similarity Measurements for Multi-Lingual Short Texts Using Wikipedia)
 
(wikilinks)
Line 1: Line 1:
'''Semantic Similarity Measurements for Multi-Lingual Short Texts Using Wikipedia''' - scientific work related to Wikipedia quality published in 2014, written by Tatsuya Nakamura, Masumi Shirakawa, Takahiro Hara and Shojiro Nishio.
+
'''Semantic Similarity Measurements for Multi-Lingual Short Texts Using Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[Tatsuya Nakamura]], [[Masumi Shirakawa]], [[Takahiro Hara]] and [[Shojiro Nishio]].
  
 
== Overview ==
 
== Overview ==
In this paper, authors propose two methods to measure the semantic similarity for multi-lingual and short texts by using Wikipedia. In recent years, people around the world have been continuously generating information about their local area in their own languages on social networking services. Measuring the similarity between the texts is challenging because they are often short and written in various languages. Authors methods solve this problem by incorporating inter-language links of Wikipedia into extended naive Bayes (ENB), a probabilistic method of semantic similarity measurements for short texts. The proposed methods represent a multi-lingual short text as a vector of the English version of Wikipedia articles (entities). Authors conducted an experiment on clustering of tweets written in four languages (English, Spanish, Japanese and Arabic). From the experimental results, authors confirmed that methods outperformed cross-lingual explicit semantic analysis (CL-ESA), which is a method to measure the similarity between texts written in two different languages. Moreover, methods were competitive with ENB applied to texts that have been translated into English using Google Translate. Authors methods enabled similarity measurements for multi-lingual short texts without the cost of machine translations.
+
In this paper, authors propose two methods to measure the [[semantic similarity]] for multi-lingual and short texts by using [[Wikipedia]]. In recent years, people around the world have been continuously generating information about their local area in their own languages on [[social network]]ing services. Measuring the similarity between the texts is challenging because they are often short and written in various languages. Authors methods solve this problem by incorporating inter-language links of Wikipedia into extended naive Bayes (ENB), a probabilistic method of semantic similarity measurements for short texts. The proposed methods represent a multi-lingual short text as a vector of the English version of Wikipedia articles (entities). Authors conducted an experiment on clustering of tweets written in four languages (English, Spanish, Japanese and Arabic). From the experimental results, authors confirmed that methods outperformed [[cross-lingual]] explicit semantic analysis (CL-ESA), which is a method to measure the similarity between texts written in two [[different language]]s. Moreover, methods were competitive with ENB applied to texts that have been translated into English using [[Google]] Translate. Authors methods enabled similarity measurements for multi-lingual short texts without the cost of [[machine translation]]s.

Revision as of 11:43, 16 June 2019

Semantic Similarity Measurements for Multi-Lingual Short Texts Using Wikipedia - scientific work related to Wikipedia quality published in 2014, written by Tatsuya Nakamura, Masumi Shirakawa, Takahiro Hara and Shojiro Nishio.

Overview

In this paper, authors propose two methods to measure the semantic similarity for multi-lingual and short texts by using Wikipedia. In recent years, people around the world have been continuously generating information about their local area in their own languages on social networking services. Measuring the similarity between the texts is challenging because they are often short and written in various languages. Authors methods solve this problem by incorporating inter-language links of Wikipedia into extended naive Bayes (ENB), a probabilistic method of semantic similarity measurements for short texts. The proposed methods represent a multi-lingual short text as a vector of the English version of Wikipedia articles (entities). Authors conducted an experiment on clustering of tweets written in four languages (English, Spanish, Japanese and Arabic). From the experimental results, authors confirmed that methods outperformed cross-lingual explicit semantic analysis (CL-ESA), which is a method to measure the similarity between texts written in two different languages. Moreover, methods were competitive with ENB applied to texts that have been translated into English using Google Translate. Authors methods enabled similarity measurements for multi-lingual short texts without the cost of machine translations.