Difference between revisions of "Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities"

From Wikipedia Quality
Jump to: navigation, search
(Links)
(+ infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities
 +
| date = 2013
 +
| authors = [[Masumi Shirakawa]]<br />[[Kotaro Nakayama]]<br />[[Takahiro Hara]]<br />[[Shojiro Nishio]]
 +
| doi = 10.1145/2505515.2505600
 +
| link = http://dl.acm.org/citation.cfm?id=2505600
 +
}}
 
'''Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Masumi Shirakawa]], [[Kotaro Nakayama]], [[Takahiro Hara]] and [[Shojiro Nishio]].
 
'''Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Masumi Shirakawa]], [[Kotaro Nakayama]], [[Takahiro Hara]] and [[Shojiro Nishio]].
  
 
== Overview ==
 
== Overview ==
 
This paper describes a novel probabilistic method of measuring [[semantic similarity]] for real-world noisy short texts like microblog posts. Authors method adds related [[Wikipedia]] entities to a short text as its semantic representation and uses the vector of entities for computing semantic similarity. Adding related entities to texts is generally a compound problem that involves the extraction of key terms, finding related entities for each key term, and the aggregation of related entities. Explicit Semantic Analysis (ESA), a popular Wikipedia-based method, solves these problems by summing the weighted vectors of related entities. However, this heuristic weighting highly depends on the rule of majority decision and is not suited to short texts that contain few key terms but many noisy terms. The proposed probabilistic method synthesizes these procedures by extending naive Bayes and achieves robust estimates of related Wikipedia entities for short texts. Experimental results on short text clustering using [[Twitter]] data indicated that method outperformed ESA for short texts containing noisy terms.
 
This paper describes a novel probabilistic method of measuring [[semantic similarity]] for real-world noisy short texts like microblog posts. Authors method adds related [[Wikipedia]] entities to a short text as its semantic representation and uses the vector of entities for computing semantic similarity. Adding related entities to texts is generally a compound problem that involves the extraction of key terms, finding related entities for each key term, and the aggregation of related entities. Explicit Semantic Analysis (ESA), a popular Wikipedia-based method, solves these problems by summing the weighted vectors of related entities. However, this heuristic weighting highly depends on the rule of majority decision and is not suited to short texts that contain few key terms but many noisy terms. The proposed probabilistic method synthesizes these procedures by extending naive Bayes and achieves robust estimates of related Wikipedia entities for short texts. Experimental results on short text clustering using [[Twitter]] data indicated that method outperformed ESA for short texts containing noisy terms.

Revision as of 07:51, 7 June 2020


Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities
Authors
Masumi Shirakawa
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
Publication date
2013
DOI
10.1145/2505515.2505600
Links
Original

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities - scientific work related to Wikipedia quality published in 2013, written by Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara and Shojiro Nishio.

Overview

This paper describes a novel probabilistic method of measuring semantic similarity for real-world noisy short texts like microblog posts. Authors method adds related Wikipedia entities to a short text as its semantic representation and uses the vector of entities for computing semantic similarity. Adding related entities to texts is generally a compound problem that involves the extraction of key terms, finding related entities for each key term, and the aggregation of related entities. Explicit Semantic Analysis (ESA), a popular Wikipedia-based method, solves these problems by summing the weighted vectors of related entities. However, this heuristic weighting highly depends on the rule of majority decision and is not suited to short texts that contain few key terms but many noisy terms. The proposed probabilistic method synthesizes these procedures by extending naive Bayes and achieves robust estimates of related Wikipedia entities for short texts. Experimental results on short text clustering using Twitter data indicated that method outperformed ESA for short texts containing noisy terms.