Difference between revisions of "Extracting Geospatial Entities from Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Infobox work)
(+ Embed)
Line 11: Line 11:
 
== Overview ==
 
== Overview ==
 
This paper addresses the challenge of extracting geospatial data from the article text of the [[English Wikipedia]]. In the first phase of work, authors create a training corpus and select a set of word-based [[features]] to train a Support Vector Machine (SVM) for the task of geospatial [[named entity recognition]]. Authors target for testing a corpus of [[Wikipedia]] articles about battles and wars, as these have a high incidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, authors present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. Authors achieve an f-measure of 82%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia.
 
This paper addresses the challenge of extracting geospatial data from the article text of the [[English Wikipedia]]. In the first phase of work, authors create a training corpus and select a set of word-based [[features]] to train a Support Vector Machine (SVM) for the task of geospatial [[named entity recognition]]. Authors target for testing a corpus of [[Wikipedia]] articles about battles and wars, as these have a high incidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, authors present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. Authors achieve an f-measure of 82%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Witmer, Jeremy; Kalita, Jugal K.. (2009). "[[Extracting Geospatial Entities from Wikipedia]]".DOI: 10.1109/ICSC.2009.62.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Witmer |first1=Jeremy |last2=Kalita |first2=Jugal K. |title=Extracting Geospatial Entities from Wikipedia |date=2009 |doi=10.1109/ICSC.2009.62 |url=https://wikipediaquality.com/wiki/Extracting_Geospatial_Entities_from_Wikipedia}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Witmer, Jeremy; Kalita, Jugal K.. (2009). &amp;quot;<a href="https://wikipediaquality.com/wiki/Extracting_Geospatial_Entities_from_Wikipedia">Extracting Geospatial Entities from Wikipedia</a>&amp;quot;.DOI: 10.1109/ICSC.2009.62.
 +
</nowiki>
 +
</code>

Revision as of 08:33, 22 May 2020


Extracting Geospatial Entities from Wikipedia
Authors
Jeremy Witmer
Jugal K. Kalita
Publication date
2009
DOI
10.1109/ICSC.2009.62
Links
Original Preprint

Extracting Geospatial Entities from Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Jeremy Witmer and Jugal K. Kalita.

Overview

This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of work, authors create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. Authors target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, authors present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. Authors achieve an f-measure of 82%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia.

Embed

Wikipedia Quality

Witmer, Jeremy; Kalita, Jugal K.. (2009). "[[Extracting Geospatial Entities from Wikipedia]]".DOI: 10.1109/ICSC.2009.62.

English Wikipedia

{{cite journal |last1=Witmer |first1=Jeremy |last2=Kalita |first2=Jugal K. |title=Extracting Geospatial Entities from Wikipedia |date=2009 |doi=10.1109/ICSC.2009.62 |url=https://wikipediaquality.com/wiki/Extracting_Geospatial_Entities_from_Wikipedia}}

HTML

Witmer, Jeremy; Kalita, Jugal K.. (2009). &quot;<a href="https://wikipediaquality.com/wiki/Extracting_Geospatial_Entities_from_Wikipedia">Extracting Geospatial Entities from Wikipedia</a>&quot;.DOI: 10.1109/ICSC.2009.62.