Youcat: Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments Using Wordnet & Wikipedia

From Wikipedia Quality
Revision as of 08:21, 19 July 2019 by Liliana (talk | contribs) (New study: Youcat: Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments Using Wordnet & Wikipedia)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Youcat: Weakly Supervised Youtube Video Categorization System from Meta Data & User Comments Using Wordnet & Wikipedia - scientific work related to Wikipedia quality published in 2012, written by Subhabrata Mukherjee and Pushpak Bhattacharyya.

Overview

In this paper, authors propose a weakly supervised system, YouCat , for categorizing Youtube videos into different genres like Comedy, Horror, Romance, Sports and Technology The system takes a Youtube video url as input and gives it a belongingness score for ea ch genre. The key aspects of this work can be summarized as: (1) Unlike other ge nre identification works, which are mostly supervised, this system is mostly unsupervised, requiring no labeled data for training. (2) The system can easily incorporate new genres without re quiring labeled data for the genres. (3) YouCat extracts information from the video title , meta description and user comments (which together form the video descriptor ). (4) It uses Wikipedia and WordNet for concept expansion. (5) The proposed algorithm with a time complexity of O(|W|) (where (|W|) is the number of words in the video descriptor) is efficient to be deployed i n web for real-time video categorization. Experimentations have been performed on real world Youtube videos where YouCat achieves an F-score of 80.9% , without using any labeled training set, compared to the supervised, multiclass SVM F-score of 84.36% for single genre prediction . YouCat performs better for multi-genre prediction with an F-Score of 90.48% . Weak supervision in the system arises out of the usage of manually constructed WordNet and genre description by a few root words.