期刊名称:International Journal of Computer and Information Technology
印刷版ISSN:2279-0764
出版年度:2015
卷号:4
期号:4
页码:748
出版社:International Journal of Computer and Information Technology
摘要:The automatic classification of Web objects into semantic categories is very important to facilitate indexing, browsing, searching, and mining these objects. But this is a very challenging task, because web objects often suffer from a lack of easy-extractable features with semantic information, interconnections between each other, and training examples with category labels. Social tags reflect the web objects semantics from users' points of view, which makes them an ideal web objects feature that overcomes the difficulties of web object classification. In this paper we study the impact of using social tagging on the performance of text classification techniques in web objects classification. An automated system for web objects classification has been developed that is based on social tags exploration. The system has three phases: data preprocessing, classification and evaluation phases. It accepts a training dataset that represents a set of web pages with its URLs, tags, titles and categories. Using this dataset, the system constructs a predictive model that is later used to assign labels to web objects based on their tags. In the classification step, the system employs three known text classification techniques namely, Support Vector Machine, Na.ve Bayes, and Decision Tree, through the WEKA software. Experiments have been conducted to evaluate the effectiveness of using social tags with each one of the three text classification techniques in web objects classification. The experimental results indicate that using tags significantly improve the classification performance
关键词:web objects classification; social tagging; text ; classification methods; WEKA software; cross validation