Difference between revisions of "Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities"

Revision as of 07:51, 7 June 2020

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities
Authors	Masumi Shirakawa Kotaro Nakayama Takahiro Hara Shojiro Nishio
Publication date	2013
DOI	10.1145/2505515.2505600
Links	Original

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities - scientific work related to Wikipedia quality published in 2013, written by Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara and Shojiro Nishio.

Overview

This paper describes a novel probabilistic method of measuring semantic similarity for real-world noisy short texts like microblog posts. Authors method adds related Wikipedia entities to a short text as its semantic representation and uses the vector of entities for computing semantic similarity. Adding related entities to texts is generally a compound problem that involves the extraction of key terms, finding related entities for each key term, and the aggregation of related entities. Explicit Semantic Analysis (ESA), a popular Wikipedia-based method, solves these problems by summing the weighted vectors of related entities. However, this heuristic weighting highly depends on the rule of majority decision and is not suited to short texts that contain few key terms but many noisy terms. The proposed probabilistic method synthesizes these procedures by extending naive Bayes and achieves robust estimates of related Wikipedia entities for short texts. Experimental results on short text clustering using Twitter data indicated that method outperformed ESA for short texts containing noisy terms.

@@ Line 1: / Line 1: @@
+{{Infobox work
+| title = Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities
+| date = 2013
+| authors = [[Masumi Shirakawa]]<br />[[Kotaro Nakayama]]<br />[[Takahiro Hara]]<br />[[Shojiro Nishio]]
+| doi = 10.1145/2505515.2505600
+| link = http://dl.acm.org/citation.cfm?id=2505600
+}}
 '''Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Masumi Shirakawa]], [[Kotaro Nakayama]], [[Takahiro Hara]] and [[Shojiro Nishio]].
 == Overview ==
 This paper describes a novel probabilistic method of measuring [[semantic similarity]] for real-world noisy short texts like microblog posts. Authors method adds related [[Wikipedia]] entities to a short text as its semantic representation and uses the vector of entities for computing semantic similarity. Adding related entities to texts is generally a compound problem that involves the extraction of key terms, finding related entities for each key term, and the aggregation of related entities. Explicit Semantic Analysis (ESA), a popular Wikipedia-based method, solves these problems by summing the weighted vectors of related entities. However, this heuristic weighting highly depends on the rule of majority decision and is not suited to short texts that contain few key terms but many noisy terms. The proposed probabilistic method synthesizes these procedures by extending naive Bayes and achieves robust estimates of related Wikipedia entities for short texts. Experimental results on short text clustering using [[Twitter]] data indicated that method outperformed ESA for short texts containing noisy terms.

Difference between revisions of "Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities"

Revision as of 07:51, 7 June 2020

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools