期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:5
页码:9785
DOI:10.15680/IJIRCCE.2017.0505184
出版社:S&S Publications
摘要:The primary purpose of data mining is to extract information from huge amounts of raw data. To get theuseful data from large amount of available data is necessary. Web document classification includes the classification ofweb snippets into different categories based on their content. The classes are predefined in which the pages areclassified. The web snippets from first three pages of Google extracted and prepossessed. Preprocessing includestokenisation, reduction of redundant and irrelevant data. After the prepossessing of the web snippets, Modified NaïveBayesian approach is used to get the snippets classified into predefined categories. From these the probability of eachword will be calculated and page will be classified into its predefined class based on the highest posterior probabilitycalculated. The Modified Naive Bayes classifier is used to calculate the probability of each word with respect to eachclass. By using snippets as a input we managed to reduce the require classification time up to 49.04 %, shows the Fmeasurevalue 93.79 % and achieved accuracy up to 96.01 %. An analysis of the system reveals that the snippetsclassification system works well even when the number of snippets is increased.