期刊名称:Current Journal of Applied Science and Technology
印刷版ISSN:2457-1024
出版年度:2015
卷号:12
期号:5
页码:1-12
语种:English
出版社:Sciencedomain International
摘要:With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant and in-time knowledge about crimes from this huge size of information. The huge size of such data makes the process of retrieving and analyzing texts manually a very difficult task. Furthermore, domain-specific documents classification is a hard task and suffers from low classification efficiency due to overlapping among domain subclasses. This work is focused on finding an appropriate classification model for crime domain-specific knowledge on the Web. To do that, the two-level classification method for online crime text filtering and classification is used. In each level, three feature selection methods (Gini Index, Chi-square statistic and Information gain) and three learning methods (K-nearest neighbor, Naive Bayes and support vector machine (SVM)) are investigated. The experimental results in the first level indicate that Information gain feature selection method performs the best for crime terms selection and both SVM and NB exhibit the best performance for crime text filtering. Furthermore, the experimental results in the second level indicate that Gini index feature selection method performs the best for crime types terms selection and SVM classifier exhibits the best performance on classifying crime documents into their appropriate crime types.
关键词:Crime data mining;web mining;focused crawling;classification