Difference between revisions of "Context-Based Spelling Correction for the Dutch Language: Applied on Spelling Errors Extracted from the Dutch Wikipedia Revision History"

From Wikipedia Quality
Jump to: navigation, search
(Starting an article - Context-Based Spelling Correction for the Dutch Language: Applied on Spelling Errors Extracted from the Dutch Wikipedia Revision History)
 
(Wikilinks)
Line 1: Line 1:
'''Context-Based Spelling Correction for the Dutch Language: Applied on Spelling Errors Extracted from the Dutch Wikipedia Revision History''' - scientific work related to Wikipedia quality published in 2014, written by L.J. Tijhuis.
+
'''Context-Based Spelling Correction for the Dutch Language: Applied on Spelling Errors Extracted from the Dutch Wikipedia Revision History''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[L.J. Tijhuis]].
  
 
== Overview ==
 
== Overview ==
In this thesis authors did research on context-based spellchecking approaches for the Dutch language. Context-based approaches enable the detection of real-word spelling errors by using the context in which the errors occur. Authors also assessed if authors could improve the ranking of replacement candidates by using the context. To be able to measure the performance of the different techniques used, a dataset containing erroneous-corrected sentence pairs was obtained from the Dutch Wikipedia revision history. This dataset contains a wide variety of human generated spelling errors, and consists of over 1.4 million instances. It can serve as a basis for further research. The obtained dataset showed to be a valuable source for the creation of an error model, with which authors could improve the ranking of candidate replacement words. This model takes the character context in which erroneous edit operations occur into account, and therefore reflects what kind of edit operations are more likely to occur. The spellchecking results using dataset show that the context-based approach used, works for both the detection of errors and the ranking of candidate replacements. A comparison with literature was made to assess if the technique used performs as good for Dutch as for English and authors conclude that the performance is comparable. The error model trained on dataset was shown to work better than the context-based approach for the task of candidate ranking.
+
In this thesis authors did research on context-based spellchecking approaches for the Dutch language. Context-based approaches enable the detection of real-word spelling errors by using the context in which the errors occur. Authors also assessed if authors could improve the ranking of replacement candidates by using the context. To be able to measure the performance of the different techniques used, a dataset containing erroneous-corrected sentence pairs was obtained from the Dutch [[Wikipedia]] revision history. This dataset contains a wide variety of human generated spelling errors, and consists of over 1.4 million instances. It can serve as a basis for further research. The obtained dataset showed to be a valuable source for the creation of an error model, with which authors could improve the ranking of candidate replacement words. This model takes the character context in which erroneous edit operations occur into account, and therefore reflects what kind of edit operations are more likely to occur. The spellchecking results using dataset show that the context-based approach used, works for both the detection of errors and the ranking of candidate replacements. A comparison with literature was made to assess if the technique used performs as good for Dutch as for English and authors conclude that the performance is comparable. The error model trained on dataset was shown to work better than the context-based approach for the task of candidate ranking.

Revision as of 09:02, 4 June 2019

Context-Based Spelling Correction for the Dutch Language: Applied on Spelling Errors Extracted from the Dutch Wikipedia Revision History - scientific work related to Wikipedia quality published in 2014, written by L.J. Tijhuis.

Overview

In this thesis authors did research on context-based spellchecking approaches for the Dutch language. Context-based approaches enable the detection of real-word spelling errors by using the context in which the errors occur. Authors also assessed if authors could improve the ranking of replacement candidates by using the context. To be able to measure the performance of the different techniques used, a dataset containing erroneous-corrected sentence pairs was obtained from the Dutch Wikipedia revision history. This dataset contains a wide variety of human generated spelling errors, and consists of over 1.4 million instances. It can serve as a basis for further research. The obtained dataset showed to be a valuable source for the creation of an error model, with which authors could improve the ranking of candidate replacement words. This model takes the character context in which erroneous edit operations occur into account, and therefore reflects what kind of edit operations are more likely to occur. The spellchecking results using dataset show that the context-based approach used, works for both the detection of errors and the ranking of candidate replacements. A comparison with literature was made to assess if the technique used performs as good for Dutch as for English and authors conclude that the performance is comparable. The error model trained on dataset was shown to work better than the context-based approach for the task of candidate ranking.