Difference between revisions of "Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions"

From Wikipedia Quality
Jump to: navigation, search
(Starting a page: Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions)
 
(+ links)
 
Line 1: Line 1:
'''Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions''' - scientific work related to Wikipedia quality published in 2009, written by Andreas Eiselt and Paolo Rosso.
+
'''Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Andreas Eiselt]] and [[Paolo Rosso]].
  
 
== Overview ==
 
== Overview ==
 
Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. Authors have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison authors introduce a publicly available corpus specially suited for this task. Furthermore authors introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.
 
Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. Authors have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison authors introduce a publicly available corpus specially suited for this task. Furthermore authors introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.

Latest revision as of 22:27, 12 August 2019

Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions - scientific work related to Wikipedia quality published in 2009, written by Andreas Eiselt and Paolo Rosso.

Overview

Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. Authors have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison authors introduce a publicly available corpus specially suited for this task. Furthermore authors introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.