Tweet Segmentation and Its Application to Named Entity Recognition

From Wikipedia Quality
Jump to: navigation, search
Tweet Segmentation and Its Application to Named Entity Recognition
Authors
Chenliang Li
Aixin Sun
Jianshu Weng
Qi He
Publication date
2015
ISSN
10414347
DOI
10.1109/TKDE.2014.2327042
Links

Tweet Segmentation and Its Application to Named Entity Recognition - scientific work about Wikipedia quality published in 2015, written by Chenliang Li, Aixin Sun, Jianshu Weng and Qi He.

Overview

Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, authors propose a novel framework for tweet segmentation in a batch mode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context information is well preserved and easily extracted by the downstream applications. HybridSeg finds the optimal segmentation of a tweet by maximizing the sum of the stickiness scores of its candidate segments. The stickiness score considers the probability of a segment being a phrase in English (i.e., global context) and the probability of a segment being a phrase within the batch of tweets (i.e., local context). For the latter, authors propose and evaluate two models to derive local context by considering the linguistic features and term-dependency in a batch of tweets, respectively. HybridSeg is also designed to iteratively learn from confident segments as pseudo feedback. Experiments on two tweet data sets show that tweet segmentation quality is significantly improved by learning both global and local contexts compared with using global context alone. Through analysis and comparison, authors show that local linguistic features are more reliable for learning local context compared with term-dependency. As an application, authors show that high accuracy is achieved in named entity recognition by applying segment-based part-of-speech (POS) tagging.

Embed

Wikipedia Quality

Li, Chenliang; Sun, Aixin; Weng, Jianshu; He, Qi. (2015). "[[Tweet Segmentation and Its Application to Named Entity Recognition]]". IEEE Transactions on Knowledge and Data Engineering Volume 27, Issue 2, 1 February 2015, Article number 6823714, pp. 558-570. ISSN: 10414347. DOI: 10.1109/TKDE.2014.2327042.

English Wikipedia

{{cite journal |last1=Li |first1=Chenliang |last2=Sun |first2=Aixin |last3=Weng |first3=Jianshu |last4=He |first4=Qi |title=Tweet Segmentation and Its Application to Named Entity Recognition |date=2015 |issn=10414347 |doi=10.1109/TKDE.2014.2327042 |url=https://wikipediaquality.com/wiki/Tweet_Segmentation_and_Its_Application_to_Named_Entity_Recognition |journal=IEEE Transactions on Knowledge and Data Engineering Volume 27, Issue 2, 1 February 2015, Article number 6823714, pp. 558-570}}

HTML

Li, Chenliang; Sun, Aixin; Weng, Jianshu; He, Qi. (2015). &quot;<a href="https://wikipediaquality.com/wiki/Tweet_Segmentation_and_Its_Application_to_Named_Entity_Recognition">Tweet Segmentation and Its Application to Named Entity Recognition</a>&quot;. IEEE Transactions on Knowledge and Data Engineering Volume 27, Issue 2, 1 February 2015, Article number 6823714, pp. 558-570. ISSN: 10414347. DOI: 10.1109/TKDE.2014.2327042.