文章基本信息

标题：Text Mining: Pattern Extraction and Classification (Data management and Distribution)
本地全文：下载
作者：B Sankara Babu ; Dr. K. Rajasekhara Rao
期刊名称：International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN：2347-6710
电子版ISSN：2319-8753
出版年度：2016
卷号：5
期号：6
页码：10064
DOI：10.15680/IJIRSET.2015.0506105
出版社：S&S Publications
摘要：Data mining refers to the process of retrieving knowledge by discovering novel and relative patternsfrom large datasets. Clustering and Classification are two distinct phases in data mining that work to provide anestablished, proven structure from a voluminous collection of facts. In this paper, our focus is to analyze clusters ofdocuments obtained via unsupervised clustering techniques and compare the performance of classification algorithmson the documents. Cluster is a group of objects that belongs to the same class. In other words, similar objects aregrouped in one cluster and dissimilar objects are grouped in another cluster using the k-means algorithm. Classificationis a task of assigning instances to predefined classes. We have a Training set containing data that have been previouslycategorized, and based on this Training set the algorithms finds the category that the new data points belongs to it usingthe secure hashing algorithm. K-means algorithm is used for classification and SHA-256 algorithm is used for protectthe data securely in digital hash code.
关键词：Data mining; Clustering and Classification; K-means algorithm; SHA-256 algorithm.