期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2016
卷号:7
期号:1
页码:254-256
出版社:TechScience Publications
摘要:Text categorization is a process of assigning various inputtexts (or documents) to one or more target categories based on itscontents. This paper introduces an email classification application oftext categorization, using k-Nearest Neighbor (k-NN) classification[1].In this work text categorization involves two processes: trainingprocess and classification process. First, The training processes use apreviously categorized set of documents to train the system tounderstand what each category looks like[1].Second,the classifier usesthe training 'model' to classify new incoming documents.The k-Nearest Neighbor classification method makes use of trainingdocuments, which have known categories, and finds the closestneighbors of the new sample document among all[2]. These neighborsenable to find the new document’s category. The Euclideandistancehas been used as a similarity function for measuring the difference orsimilarity between two instances[3].
关键词:Text categorization; machine learning; k-NN algorithm;similarity function