期刊名称:International Journal on Computer Science and Engineering
印刷版ISSN:2229-5631
电子版ISSN:0975-3397
出版年度:2011
卷号:3
期号:7
页码:2846-2854
出版社:Engg Journals Publications
摘要:Internet has made a profound change in the lives of many enthusiastic innovators and researchers. The information available on the web has knocked the doors of Knowledge Discovery leading to a new Information era. Unfortunately, most Search Engines provide web content which is irrelevant to the information intended to the browser. Many Text Categorization techniques for web content have been developed, to recognize the given document�s category but failed to make trust worthy results. This paper primarily focuses on web content categorization based on classic summarization technique by enabling the classification at word level. The web document is preprocessed first which involves filtering the content with classical techniques and then is converted into organized data. The organized data is then treated with predefined hierarchical categorical set to identify theexact category.
关键词:Text Categorization; Text Mining; Information Extraction; Feature Term Extraction; Information Retrieval; Pyramidal Model; Term Frequency.