Harvesting, Searching, and Ranking Knowledge on the Web

From Wikipedia Quality
Jump to: navigation, search
Harvesting, Searching, and Ranking Knowledge on the Web
Authors
Gerhard Weikum
Publication date
2009
ISBN
978-160558390-7
DOI
10.1145/1498759.1498763
Links

Harvesting, Searching, and Ranking Knowledge on the Web - scientific work about Wikipedia quality published in 2009, written by Gerhard Weikum.

Overview

There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style ontologies [22] and reaching into Deep-Web sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision. This talk presents ongoing research towards this objective, with emphasis on their work on the YAGO knowledge base [23, 24] and the NAGA search engine [14] but also covering related projects. YAGO is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet with high accuracy and reconciled into a consistent RDF-style "semantic" graph. For further growing YAGO from Web sources while retaining its high quality, pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA provides graph-template-based search over this data, with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15, 17]. YAGO is publicly available and has been imported into various other knowledge-management projects including DB-pedia. YAGO shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19], Cimple/DBlife [10, 21], DBpedia [3], Know-ItAll/TextRunner [12, 5], Kylin/KOG [26, 27], and the Libra technology [18, 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities.

Embed

Wikipedia Quality

Weikum, Gerhard. (2009). "[[Harvesting, Searching, and Ranking Knowledge on the Web]]". Journal of Documentation Volume 65, Issue 6, 16 October 2009, pp. 977-996. ISBN: 978-160558390-7. DOI: 10.1145/1498759.1498763.

English Wikipedia

{{cite journal |last1=Weikum |first1=Gerhard |title=Harvesting, Searching, and Ranking Knowledge on the Web |date=2009 |isbn=978-160558390-7 |doi=10.1145/1498759.1498763 |url=https://wikipediaquality.com/wiki/Harvesting,_Searching,_and_Ranking_Knowledge_on_the_Web |journal=Journal of Documentation Volume 65, Issue 6, 16 October 2009, pp. 977-996}}

HTML

Weikum, Gerhard. (2009). &quot;<a href="https://wikipediaquality.com/wiki/Harvesting,_Searching,_and_Ranking_Knowledge_on_the_Web">Harvesting, Searching, and Ranking Knowledge on the Web</a>&quot;. Journal of Documentation Volume 65, Issue 6, 16 October 2009, pp. 977-996. ISBN: 978-160558390-7. DOI: 10.1145/1498759.1498763.