Difference between revisions of "Linking, Searching, and Visualizing Entities in Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Basic information on Linking, Searching, and Visualizing Entities in Wikipedia)
 
(wikilinks)
Line 1: Line 1:
'''Linking, Searching, and Visualizing Entities in Wikipedia''' - scientific work related to Wikipedia quality published in 2018, written by Marcus Klang and Pierre Nugues.
+
'''Linking, Searching, and Visualizing Entities in Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2018, written by [[Marcus Klang]] and [[Pierre Nugues]].
  
 
== Overview ==
 
== Overview ==
In this paper, authors describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, authors designed a high-performance, multilingual, entity linker and authors used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a specific annotation, here the entities. Authors evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and authors reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. Authors applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and authors used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identifiers from Wikidata (Less)
+
In this paper, authors describe a new system to extract, index, search, and visualize entities in [[Wikipedia]]. To carry out the entity extraction, authors designed a high-performance, [[multilingual]], entity linker and authors used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a specific annotation, here the entities. Authors evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and authors reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. Authors applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and authors used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identifiers from [[Wikidata]] (Less)

Revision as of 09:55, 9 July 2019

Linking, Searching, and Visualizing Entities in Wikipedia - scientific work related to Wikipedia quality published in 2018, written by Marcus Klang and Pierre Nugues.

Overview

In this paper, authors describe a new system to extract, index, search, and visualize entities in Wikipedia. To carry out the entity extraction, authors designed a high-performance, multilingual, entity linker and authors used a document model to store the resulting linguistic annotations. The entity linker, HEDWIG, extracts the mentions from text usinga string matching Engine and links them toentities with a combination of statistical rules and PageRank. The document model, Docforia (Klang and Nugues, 2017), consists of layers, where each layer is a sequence of ranges describing a specific annotation, here the entities. Authors evaluated HEDWIG with the TAC 2016 data and protocol (Ji and Nothman, 2016) and authors reached the CEAFm scores of 70.0 on English, on 64.4 on Chinese, and 66.5 on Spanish. Authors applied the entity linker to the whole collection of English and Swedish articles of Wikipedia and authors used Lucene to index the layers and a search module to interactively retrieve all the concordances of an entity in Wikipedia. The user can select and visualize the concordances in the articles or paragraphs. Contrary to classic text indexing, this system does not use strings to identify the entities but unique identifiers from Wikidata (Less)