Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities

From Wikipedia Quality
Revision as of 07:26, 30 March 2021 by Magdalene (talk | contribs) (Adding embed)
Jump to: navigation, search


Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities
Authors
Masumi Shirakawa
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
Publication date
2013
DOI
10.1145/2505515.2505600
Links
Original

Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities - scientific work related to Wikipedia quality published in 2013, written by Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara and Shojiro Nishio.

Overview

This paper describes a novel probabilistic method of measuring semantic similarity for real-world noisy short texts like microblog posts. Authors method adds related Wikipedia entities to a short text as its semantic representation and uses the vector of entities for computing semantic similarity. Adding related entities to texts is generally a compound problem that involves the extraction of key terms, finding related entities for each key term, and the aggregation of related entities. Explicit Semantic Analysis (ESA), a popular Wikipedia-based method, solves these problems by summing the weighted vectors of related entities. However, this heuristic weighting highly depends on the rule of majority decision and is not suited to short texts that contain few key terms but many noisy terms. The proposed probabilistic method synthesizes these procedures by extending naive Bayes and achieves robust estimates of related Wikipedia entities for short texts. Experimental results on short text clustering using Twitter data indicated that method outperformed ESA for short texts containing noisy terms.

Embed

Wikipedia Quality

Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro. (2013). "[[Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities]]".DOI: 10.1145/2505515.2505600.

English Wikipedia

{{cite journal |last1=Shirakawa |first1=Masumi |last2=Nakayama |first2=Kotaro |last3=Hara |first3=Takahiro |last4=Nishio |first4=Shojiro |title=Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities |date=2013 |doi=10.1145/2505515.2505600 |url=https://wikipediaquality.com/wiki/Probabilistic_Semantic_Similarity_Measurements_for_Noisy_Short_Texts_Using_Wikipedia_Entities}}

HTML

Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro. (2013). &quot;<a href="https://wikipediaquality.com/wiki/Probabilistic_Semantic_Similarity_Measurements_for_Noisy_Short_Texts_Using_Wikipedia_Entities">Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities</a>&quot;.DOI: 10.1145/2505515.2505600.