期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2013
卷号:54
期号:1
出版社:Journal of Theoretical and Applied
摘要:Content stored/shared on Web and document repositories has increased greatly leading to problems in locating required information from massive volumes. Progress in retrieving required information was achieved with search engine technology development that could collect, store and pre-process information globally, responding to users� needs instantly. Use of text classification techniques ensures web page classification. Presently, semantics are the basis for content description and query processing techniques required for Information Retrieval (IR). This paper presents an approach for information retrieval from web pages, based on the proposed extraction methods. AdaBoost algorithm is used to obtain and classify features and BF tree with the proposed feature extraction ensures high classification accuracy.