Difference between revisions of "Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids"

From Wikipedia Quality
Jump to: navigation, search
(Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids -- new article)
 
(+ links)
Line 1: Line 1:
'''Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids''' - scientific work related to Wikipedia quality published in 2014, written by Marek Lipczak, Arash Koushkestani and Evangelos E. Milios.
+
'''Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids''' - scientific work related to [[Wikipedia quality]] published in 2014, written by [[Marek Lipczak]], [[Arash Koushkestani]] and [[Evangelos E. Milios]].
  
 
== Overview ==
 
== Overview ==
This article presents Tulip, an ERD system submitted to the ERD 2014: Entity Recognition and Disambiguation Challenge. The objective of the proposed system is to spot mentions of entities in a document and link the mentions to corresponding Freebase articles. To achieve it, Tulip prunes the set of entity candidates focusing on a core subset of related entities capturing the context of the document. The relationship strength is measured as a similarity to a topic centroid generated from entity features. Each entity is represented by an accurate and compact feature vector extracted from a category graph built based on information from 120 language versions of Wikipedia. Given the core set of accepted entities Tulip uses the Wikipedia-based feature vectors to extract more related entities from the document text. Tulip received the first prize in the long document track with F1 score of 0.74, which confirms the effectiveness of system. At the same, the system was faster than all other submissions with latency under 0.29 seconds.
+
This article presents Tulip, an ERD system submitted to the ERD 2014: Entity Recognition and Disambiguation Challenge. The objective of the proposed system is to spot mentions of entities in a document and link the mentions to corresponding Freebase articles. To achieve it, Tulip prunes the set of entity candidates focusing on a core subset of related entities capturing the context of the document. The relationship strength is measured as a similarity to a topic centroid generated from entity [[features]]. Each entity is represented by an accurate and compact feature vector extracted from a category graph built based on information from 120 [[language versions]] of [[Wikipedia]]. Given the core set of accepted entities Tulip uses the Wikipedia-based feature vectors to extract more related entities from the document text. Tulip received the first prize in the long document track with F1 score of 0.74, which confirms the effectiveness of system. At the same, the system was faster than all other submissions with latency under 0.29 seconds.

Revision as of 05:50, 14 June 2019

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids - scientific work related to Wikipedia quality published in 2014, written by Marek Lipczak, Arash Koushkestani and Evangelos E. Milios.

Overview

This article presents Tulip, an ERD system submitted to the ERD 2014: Entity Recognition and Disambiguation Challenge. The objective of the proposed system is to spot mentions of entities in a document and link the mentions to corresponding Freebase articles. To achieve it, Tulip prunes the set of entity candidates focusing on a core subset of related entities capturing the context of the document. The relationship strength is measured as a similarity to a topic centroid generated from entity features. Each entity is represented by an accurate and compact feature vector extracted from a category graph built based on information from 120 language versions of Wikipedia. Given the core set of accepted entities Tulip uses the Wikipedia-based feature vectors to extract more related entities from the document text. Tulip received the first prize in the long document track with F1 score of 0.74, which confirms the effectiveness of system. At the same, the system was faster than all other submissions with latency under 0.29 seconds.