Difference between revisions of "Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes"

From Wikipedia Quality
Jump to: navigation, search
(Infobox work)
(+ embed code)
Line 10: Line 10:
 
== Overview ==
 
== Overview ==
 
This paper proposes a [[Wikipedia]]-based [[semantic similarity]] measurement method that is intended for real-world noisy short texts. Authors method is a kind of explicit semantic analysis (ESA), which adds a bag of Wikipedia entities (Wikipedia pages) to a text as its semantic representation and uses the vector of entities for computing the semantic similarity. Adding related entities to a text, not a single word or phrase, is a challenging practical problem because it usually consists of several subproblems, e.g., key term extraction from texts, related entity finding for each key term, and weight aggregation of related entities. Authors proposed method solves this aggregation problem using extended naive Bayes, a probabilistic weighting mechanism based on the Bayes' theorem. Authors method is effective especially when the short text is semantically noisy, i.e., they contain some meaningless or misleading terms for estimating their main topic. Experimental results on [[Twitter]] message and Web snippet clustering revealed that method outperformed ESA for noisy short texts. Authors also found that reducing the dimension of the vector to representative Wikipedia entities scarcely affected the performance while decreasing the vector size and hence the storage space and the processing time of computing the cosine similarity.
 
This paper proposes a [[Wikipedia]]-based [[semantic similarity]] measurement method that is intended for real-world noisy short texts. Authors method is a kind of explicit semantic analysis (ESA), which adds a bag of Wikipedia entities (Wikipedia pages) to a text as its semantic representation and uses the vector of entities for computing the semantic similarity. Adding related entities to a text, not a single word or phrase, is a challenging practical problem because it usually consists of several subproblems, e.g., key term extraction from texts, related entity finding for each key term, and weight aggregation of related entities. Authors proposed method solves this aggregation problem using extended naive Bayes, a probabilistic weighting mechanism based on the Bayes' theorem. Authors method is effective especially when the short text is semantically noisy, i.e., they contain some meaningless or misleading terms for estimating their main topic. Experimental results on [[Twitter]] message and Web snippet clustering revealed that method outperformed ESA for noisy short texts. Authors also found that reducing the dimension of the vector to representative Wikipedia entities scarcely affected the performance while decreasing the vector size and hence the storage space and the processing time of computing the cosine similarity.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro. (2015). "[[Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes]]".DOI: 10.1109/TETC.2015.2418716.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Shirakawa |first1=Masumi |last2=Nakayama |first2=Kotaro |last3=Hara |first3=Takahiro |last4=Nishio |first4=Shojiro |title=Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes |date=2015 |doi=10.1109/TETC.2015.2418716 |url=https://wikipediaquality.com/wiki/Wikipedia-Based_Semantic_Similarity_Measurements_for_Noisy_Short_Texts_Using_Extended_Naive_Bayes}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro. (2015). &amp;quot;<a href="https://wikipediaquality.com/wiki/Wikipedia-Based_Semantic_Similarity_Measurements_for_Noisy_Short_Texts_Using_Extended_Naive_Bayes">Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes</a>&amp;quot;.DOI: 10.1109/TETC.2015.2418716.
 +
</nowiki>
 +
</code>

Revision as of 07:07, 6 May 2020


Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes
Authors
Masumi Shirakawa
Kotaro Nakayama
Takahiro Hara
Shojiro Nishio
Publication date
2015
DOI
10.1109/TETC.2015.2418716
Links
Original

Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes - scientific work related to Wikipedia quality published in 2015, written by Masumi Shirakawa, Kotaro Nakayama, Takahiro Hara and Shojiro Nishio.

Overview

This paper proposes a Wikipedia-based semantic similarity measurement method that is intended for real-world noisy short texts. Authors method is a kind of explicit semantic analysis (ESA), which adds a bag of Wikipedia entities (Wikipedia pages) to a text as its semantic representation and uses the vector of entities for computing the semantic similarity. Adding related entities to a text, not a single word or phrase, is a challenging practical problem because it usually consists of several subproblems, e.g., key term extraction from texts, related entity finding for each key term, and weight aggregation of related entities. Authors proposed method solves this aggregation problem using extended naive Bayes, a probabilistic weighting mechanism based on the Bayes' theorem. Authors method is effective especially when the short text is semantically noisy, i.e., they contain some meaningless or misleading terms for estimating their main topic. Experimental results on Twitter message and Web snippet clustering revealed that method outperformed ESA for noisy short texts. Authors also found that reducing the dimension of the vector to representative Wikipedia entities scarcely affected the performance while decreasing the vector size and hence the storage space and the processing time of computing the cosine similarity.

Embed

Wikipedia Quality

Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro. (2015). "[[Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes]]".DOI: 10.1109/TETC.2015.2418716.

English Wikipedia

{{cite journal |last1=Shirakawa |first1=Masumi |last2=Nakayama |first2=Kotaro |last3=Hara |first3=Takahiro |last4=Nishio |first4=Shojiro |title=Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes |date=2015 |doi=10.1109/TETC.2015.2418716 |url=https://wikipediaquality.com/wiki/Wikipedia-Based_Semantic_Similarity_Measurements_for_Noisy_Short_Texts_Using_Extended_Naive_Bayes}}

HTML

Shirakawa, Masumi; Nakayama, Kotaro; Hara, Takahiro; Nishio, Shojiro. (2015). &quot;<a href="https://wikipediaquality.com/wiki/Wikipedia-Based_Semantic_Similarity_Measurements_for_Noisy_Short_Texts_Using_Extended_Naive_Bayes">Wikipedia-Based Semantic Similarity Measurements for Noisy Short Texts Using Extended Naive Bayes</a>&quot;.DOI: 10.1109/TETC.2015.2418716.