Difference between revisions of "Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors"
(Infobox work) |
(+ categories) |
||
(One intermediate revision by one other user not shown) | |||
Line 9: | Line 9: | ||
== Overview == | == Overview == | ||
For privacy reasons, personally identifiable information like age and gender of people is not available publicly. However accurate prediction of such information has important applications in the fields of advertising, forensics and business intelligence. Existing methods for this problem have focused on classifier learning using content based [[features]] like word n-grams and style based features like Part of Speech (POS) n-grams. Two major drawbacks of previous approaches are: (1) they do not consider the semantic relation between words, and (2) they do not handle polysemy. Authors propose a novel method to address these drawbacks by representing the document using [[Wikipedia]] concepts and category information. Experimental results show that classifiers learned using such features along with previously used features help us achieve significantly better accuracy compared to the state-of-the-art methods. Indeed, feature selection shows that novel features are more effective than previously used content based features. | For privacy reasons, personally identifiable information like age and gender of people is not available publicly. However accurate prediction of such information has important applications in the fields of advertising, forensics and business intelligence. Existing methods for this problem have focused on classifier learning using content based [[features]] like word n-grams and style based features like Part of Speech (POS) n-grams. Two major drawbacks of previous approaches are: (1) they do not consider the semantic relation between words, and (2) they do not handle polysemy. Authors propose a novel method to address these drawbacks by representing the document using [[Wikipedia]] concepts and category information. Experimental results show that classifiers learned using such features along with previously used features help us achieve significantly better accuracy compared to the state-of-the-art methods. Indeed, feature selection shows that novel features are more effective than previously used content based features. | ||
+ | |||
+ | == Embed == | ||
+ | === Wikipedia Quality === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Santosh, K.; Joshi, Aditya; Gupta, Manish; Varma, Vasudeva. (2014). "[[Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors]]". | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === English Wikipedia === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | {{cite journal |last1=Santosh |first1=K. |last2=Joshi |first2=Aditya |last3=Gupta |first3=Manish |last4=Varma |first4=Vasudeva |title=Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors |date=2014 |url=https://wikipediaquality.com/wiki/Exploiting_Wikipedia_Categorization_for_Predicting_Age_and_Gender_of_Blog_Authors}} | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === HTML === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Santosh, K.; Joshi, Aditya; Gupta, Manish; Varma, Vasudeva. (2014). &quot;<a href="https://wikipediaquality.com/wiki/Exploiting_Wikipedia_Categorization_for_Predicting_Age_and_Gender_of_Blog_Authors">Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors</a>&quot;. | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | |||
+ | |||
+ | [[Category:Scientific works]] |
Latest revision as of 00:31, 17 December 2019
Authors | K. Santosh Aditya Joshi Manish Gupta Vasudeva Varma |
---|---|
Publication date | 2014 |
Links | Original |
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors - scientific work related to Wikipedia quality published in 2014, written by K. Santosh, Aditya Joshi, Manish Gupta and Vasudeva Varma.
Overview
For privacy reasons, personally identifiable information like age and gender of people is not available publicly. However accurate prediction of such information has important applications in the fields of advertising, forensics and business intelligence. Existing methods for this problem have focused on classifier learning using content based features like word n-grams and style based features like Part of Speech (POS) n-grams. Two major drawbacks of previous approaches are: (1) they do not consider the semantic relation between words, and (2) they do not handle polysemy. Authors propose a novel method to address these drawbacks by representing the document using Wikipedia concepts and category information. Experimental results show that classifiers learned using such features along with previously used features help us achieve significantly better accuracy compared to the state-of-the-art methods. Indeed, feature selection shows that novel features are more effective than previously used content based features.
Embed
Wikipedia Quality
Santosh, K.; Joshi, Aditya; Gupta, Manish; Varma, Vasudeva. (2014). "[[Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors]]".
English Wikipedia
{{cite journal |last1=Santosh |first1=K. |last2=Joshi |first2=Aditya |last3=Gupta |first3=Manish |last4=Varma |first4=Vasudeva |title=Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors |date=2014 |url=https://wikipediaquality.com/wiki/Exploiting_Wikipedia_Categorization_for_Predicting_Age_and_Gender_of_Blog_Authors}}
HTML
Santosh, K.; Joshi, Aditya; Gupta, Manish; Varma, Vasudeva. (2014). "<a href="https://wikipediaquality.com/wiki/Exploiting_Wikipedia_Categorization_for_Predicting_Age_and_Gender_of_Blog_Authors">Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors</a>".