Difference between revisions of "Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia"
(Adding infobox) |
(+ Embed) |
||
Line 10: | Line 10: | ||
== Overview == | == Overview == | ||
In this paper authors propose a new methodology to exploit [[Wikipedia]] [[features]] and structure to automatically develop an Arabic NE annotated corpus. Each Wikipedia link is transformed into an NE type of the target article in order to produce the NE annotation. Other Wikipedia features - namely redirects, anchor texts, and inter-language links - are used to tag additional NEs, which appear without links in Wikipedia texts. Furthermore, authors have developed a filtering algorithm to eliminate ambiguity when tagging candidate NEs. Herein authors also introduce a mechanism based on the high coverage of Wikipedia in order to address two challenges particular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. The corpus created with new method (WDC) has been used to train an NE tagger which has been tested on different domains. Judging by the results, an NE tagger trained on WDC can compete with those trained on manually annotated corpora. | In this paper authors propose a new methodology to exploit [[Wikipedia]] [[features]] and structure to automatically develop an Arabic NE annotated corpus. Each Wikipedia link is transformed into an NE type of the target article in order to produce the NE annotation. Other Wikipedia features - namely redirects, anchor texts, and inter-language links - are used to tag additional NEs, which appear without links in Wikipedia texts. Furthermore, authors have developed a filtering algorithm to eliminate ambiguity when tagging candidate NEs. Herein authors also introduce a mechanism based on the high coverage of Wikipedia in order to address two challenges particular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. The corpus created with new method (WDC) has been used to train an NE tagger which has been tested on different domains. Judging by the results, an NE tagger trained on WDC can compete with those trained on manually annotated corpora. | ||
+ | |||
+ | == Embed == | ||
+ | === Wikipedia Quality === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Althobaiti, Maha; Kruschwitz, Udo; Poesio, Massimo. (2014). "[[Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia]]". The Association for Computer Linguistics. DOI: 10.3115/v1/E14-3012. | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === English Wikipedia === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | {{cite journal |last1=Althobaiti |first1=Maha |last2=Kruschwitz |first2=Udo |last3=Poesio |first3=Massimo |title=Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia |date=2014 |doi=10.3115/v1/E14-3012 |url=https://wikipediaquality.com/wiki/Automatic_Creation_of_Arabic_Named_Entity_Annotated_Corpus_Using_Wikipedia |journal=The Association for Computer Linguistics}} | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === HTML === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Althobaiti, Maha; Kruschwitz, Udo; Poesio, Massimo. (2014). &quot;<a href="https://wikipediaquality.com/wiki/Automatic_Creation_of_Arabic_Named_Entity_Annotated_Corpus_Using_Wikipedia">Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia</a>&quot;. The Association for Computer Linguistics. DOI: 10.3115/v1/E14-3012. | ||
+ | </nowiki> | ||
+ | </code> |
Revision as of 12:27, 3 February 2020
Authors | Maha Althobaiti Udo Kruschwitz Massimo Poesio |
---|---|
Publication date | 2014 |
DOI | 10.3115/v1/E14-3012 |
Links | Original |
Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia - scientific work related to Wikipedia quality published in 2014, written by Maha Althobaiti, Udo Kruschwitz and Massimo Poesio.
Overview
In this paper authors propose a new methodology to exploit Wikipedia features and structure to automatically develop an Arabic NE annotated corpus. Each Wikipedia link is transformed into an NE type of the target article in order to produce the NE annotation. Other Wikipedia features - namely redirects, anchor texts, and inter-language links - are used to tag additional NEs, which appear without links in Wikipedia texts. Furthermore, authors have developed a filtering algorithm to eliminate ambiguity when tagging candidate NEs. Herein authors also introduce a mechanism based on the high coverage of Wikipedia in order to address two challenges particular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. The corpus created with new method (WDC) has been used to train an NE tagger which has been tested on different domains. Judging by the results, an NE tagger trained on WDC can compete with those trained on manually annotated corpora.
Embed
Wikipedia Quality
Althobaiti, Maha; Kruschwitz, Udo; Poesio, Massimo. (2014). "[[Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia]]". The Association for Computer Linguistics. DOI: 10.3115/v1/E14-3012.
English Wikipedia
{{cite journal |last1=Althobaiti |first1=Maha |last2=Kruschwitz |first2=Udo |last3=Poesio |first3=Massimo |title=Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia |date=2014 |doi=10.3115/v1/E14-3012 |url=https://wikipediaquality.com/wiki/Automatic_Creation_of_Arabic_Named_Entity_Annotated_Corpus_Using_Wikipedia |journal=The Association for Computer Linguistics}}
HTML
Althobaiti, Maha; Kruschwitz, Udo; Poesio, Massimo. (2014). "<a href="https://wikipediaquality.com/wiki/Automatic_Creation_of_Arabic_Named_Entity_Annotated_Corpus_Using_Wikipedia">Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia</a>". The Association for Computer Linguistics. DOI: 10.3115/v1/E14-3012.