期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:4
页码:5614-5619
出版社:TechScience Publications
摘要:Automatic web page classification is a complex and slow process. Additionally the application of classification need more accurate and efficient methods, due to increasing demand of web page categorization. Therefore, the proposed work is focuses on the web page classification and clustering schemes and obtaining an enhanced classification technique. In order to obtain efficient text mining techniques various machine learning algorithms are studied and two classification techniques namely Bayesian classification and KNN classification techniques are found for efficient and accurate results. Using both the algorithms a new hybrid technique is developed which is able to perform training on the domain specific data and successfully able to classify the web page according to the available domain knowledge
关键词:web page classification; categorization;content mining; text analysis