期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2011
卷号:11
期号:5
页码:252-257
出版社:International Journal of Computer Science and Network Security
摘要:The boom in the use of Web and its exponential growth are now well known. The amount of textual data available on the Web is estimated to be in the order of one terra byte, in addition to images, audio and video. This has imposed additional challenges to the Web directories which help the user to search the Web by classifying selected Web documents into subject. Manual classification of web pages by human expertise also suffers from the exponential increase in the amount of Web documents. Instead of using the entire web page for classifying it, this article emphasizes the need for automatic web page classification using minimum number of features in it. A method for generating such optimum number of features for web pages is also proposed. Machine learning classifiers are modeled using these optimum features. Experiments on the bench marking data sets with these machine learning classifiers have shown promising improvement in classification accuracy.
关键词:Web page Classification; Web directories; features; machine learning