Difference between revisions of "Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions"

From Wikipedia Quality
Jump to: navigation, search
(+ links)
(Infobox work)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions
 +
| date = 2009
 +
| authors = [[Andreas Eiselt]]<br />[[Paolo Rosso]]
 +
| link = http://www.uni-weimar.de/medien/webis/publications/papers/eiselt_2009.pdf
 +
}}
 
'''Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Andreas Eiselt]] and [[Paolo Rosso]].
 
'''Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Andreas Eiselt]] and [[Paolo Rosso]].
  
 
== Overview ==
 
== Overview ==
 
Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. Authors have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison authors introduce a publicly available corpus specially suited for this task. Furthermore authors introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.
 
Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. Authors have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison authors introduce a publicly available corpus specially suited for this task. Furthermore authors introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.

Revision as of 08:31, 14 October 2019


Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions
Authors
Andreas Eiselt
Paolo Rosso
Publication date
2009
Links
Original

Monolingual Text Similarity Measures: a Comparison of Models over Wikipedia Articles Revisions - scientific work related to Wikipedia quality published in 2009, written by Andreas Eiselt and Paolo Rosso.

Overview

Measuring the similarity of texts is a common task in detection of co-derivatives, plagiarism and information flow. In general the objective is to locate those fragments of a document that are derived from another text. Authors have carried out an exhaustive comparison of similarity estimation models in order to determine which one performs better on different levels of granularity and languages (English, German, Spanish, and Hindi). In connection with the comparison authors introduce a publicly available corpus specially suited for this task. Furthermore authors introduce some modifications to well known algorithms in order to demonstrate their applicability to this task. Amongst others, experiments show the strengths and weaknesses of the different models with respect to the granularity of the processed texts.