期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2013
卷号:1
期号:8
出版社:S&S Publications
摘要:Data mining is one of the most important research areas in the field of computer science. Data miningtechniques are used for extracting the hidden knowledge from the large databases. There are various research domainsin data mining such as image mining, text mining, sequential pattern mining, web mining, and so on. The purpose oftext mining is to process unstructured information, extract meaningful numeric indices from the text and thus make theinformation contained in the text accessible to the various data mining algorithms. There are various methods in textmining such as information retrieval, document similarity, information extraction, clustering, classification, and so on.Searching of similar documents has an important role in text mining and document management. Classification is oneof the main tasks in document similarity. It is used to classify the documents based on their category. In this researchwork, we have analyzed the performance of three Meta classification algorithms namely Attribute Selected Classifier,Filtered Classifier and LogitBoost. These algorithms are used for classifying computer files based on their extension.For example – pdf, txt, doc, ppt, xls and so on. The performances of Meta algorithms are analyzed by applyingperformance factors such as classification accuracy and error rate. From the experimental results, it is analyzed thatLogitBoost performs better than other algorithms.
关键词:Data mining; Text mining; Classification; AttributeSelectedClassifier; Filtered Classifier; LogitBoost