EXTIRP: Baseline Retrieval from Wikipedia

From Wikipedia Quality
Jump to: navigation, search
EXTIRP: Baseline Retrieval from Wikipedia
Authors
Miro Lehtonen
Antoine Doucet
Publication date
2007
ISSN
03029743
ISBN
978-354073887-9
Links

EXTIRP: Baseline Retrieval from Wikipedia - scientific work about Wikipedia quality published in 2007, written by Miro Lehtonen and Antoine Doucet.

Overview

The Wikipedia XML documents are considered an interesting challenge to any XML retrieval system that is capable of indexing and retrieving XML without prior knowledge of the structure. Although the structure of the Wikipedia XML documents is highly irregular and thus unpredictable, EXTIRP manages to handle all the well-formed XML documents without problems. Whether the high flexibility of EXTIRP also implies high performance concerning the quality of IR has so far been a question without definite answers. The initial results do not confirm any positive answers, but instead, they tempt us to define some requirements for the XML documents that EXTIRP is expected to index. The most interesting question stemming from their results is about the line between high-quality XML markup which aids accurate IR and noisy "XML spam" that misleads flexible XML search engines.

Embed

Wikipedia Quality

Lehtonen, Miro; Doucet, Antoine. (2007). "[[EXTIRP: Baseline Retrieval from Wikipedia]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 4518 LNCS, 2007, pp. 115-120. ISBN: 978-354073887-9. ISSN: 03029743.

English Wikipedia

{{cite journal |last1=Lehtonen |first1=Miro |last2=Doucet |first2=Antoine |title=EXTIRP: Baseline Retrieval from Wikipedia |date=2007 |isbn=978-354073887-9 |issn=03029743 |url=https://wikipediaquality.com/wiki/EXTIRP:_Baseline_Retrieval_from_Wikipedia |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 4518 LNCS, 2007, pp. 115-120}}

HTML

Lehtonen, Miro; Doucet, Antoine. (2007). &quot;<a href="https://wikipediaquality.com/wiki/EXTIRP:_Baseline_Retrieval_from_Wikipedia">EXTIRP: Baseline Retrieval from Wikipedia</a>&quot;. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 4518 LNCS, 2007, pp. 115-120. ISBN: 978-354073887-9. ISSN: 03029743.