Difference between revisions of "Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity"

From Wikipedia Quality
Jump to: navigation, search
(New study: Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity)
 
(+ links)
Line 1: Line 1:
'''Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity''' - scientific work related to Wikipedia quality published in 2009, written by Fabian Kaiser, Holger Schwarz and Mihály Jakob.
+
'''Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Fabian Kaiser]], [[Holger Schwarz]] and [[Mihály Jakob]].
  
 
== Overview ==
 
== Overview ==
Rating the similarity of two or more text documents is an essential task in information retrieval. For example, document similarity can be used to rank search engine results, cluster documents according to topics etc. A major challenge in calculating document similarity originates from the fact that two documents can have the same topic or even mean the same, while they use different wording to describe the content. A sophisticated algorithm therefore will not directly operate on the texts but will have to find a more abstract representation that captures the texts' meaning. In this paper, authors propose a novel approach for calculating the similarity of text documents. It builds on conceptual contexts that are derived from content and structure of the Wikipedia hypertext corpus.
+
Rating the similarity of two or more text documents is an essential task in [[information retrieval]]. For example, document similarity can be used to rank search engine results, cluster documents according to topics etc. A major challenge in calculating document similarity originates from the fact that two documents can have the same topic or even mean the same, while they use different wording to describe the content. A sophisticated algorithm therefore will not directly operate on the texts but will have to find a more abstract representation that captures the texts' meaning. In this paper, authors propose a novel approach for calculating the similarity of text documents. It builds on conceptual contexts that are derived from content and structure of the [[Wikipedia]] hypertext corpus.

Revision as of 12:06, 19 October 2019

Using Wikipedia-Based Conceptual Contexts to Calculate Document Similarity - scientific work related to Wikipedia quality published in 2009, written by Fabian Kaiser, Holger Schwarz and Mihály Jakob.

Overview

Rating the similarity of two or more text documents is an essential task in information retrieval. For example, document similarity can be used to rank search engine results, cluster documents according to topics etc. A major challenge in calculating document similarity originates from the fact that two documents can have the same topic or even mean the same, while they use different wording to describe the content. A sophisticated algorithm therefore will not directly operate on the texts but will have to find a more abstract representation that captures the texts' meaning. In this paper, authors propose a novel approach for calculating the similarity of text documents. It builds on conceptual contexts that are derived from content and structure of the Wikipedia hypertext corpus.