期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2014
卷号:9
期号:12
页码:97-106
DOI:10.14257/ijmue.2014.9.12.09
出版社:SERSC
摘要:Discovery and subsequent effective retrieval of useful user generated content depends on proper meta-data annotation implemented on an object such as a title and Keywords. In this study, a simpler unsupervised non graph-based algorithm for extracting keywords is proposed. A novel key phrases chunking approach was adopted; this utilizes words sequences as they appear in the original document. The simple but effective Term frequency-inverse document frequency (tf-idf) weighting scheme was exploited to rank the novelty created key- phrases. Comparing to a similar algorithm that uses three metrics weighting scheme, the tf- idf yielded a precision of 89%.Thus, the application of tf-idf algorithm on YouTube's metadata based keywords shows to be useful approach in its objectivity.
关键词:automatic extraction; Tf-Idf Weighting; Forward Words Pruning; ; Objective ; User generated content