文章基本信息

标题：Extending CC4 Neural Networks to Classify Real Life Documents
本地全文：下载
作者：Enhong Chen ; Zhenya Zhang ; Hans-Dieter Burkhard 等
期刊名称：Informatica
印刷版ISSN：1514-8327
电子版ISSN：1854-3871
出版年度：2004
卷号：28
期号：2
页码：173-180
出版社：The Slovene Society Informatika, Ljubljana
摘要：The CC4 neural network is a kind of fast pattern-learning techniques that can be used for document classification. In essence, the underlying classification mechanism of CC4 neural networks is equivalent to the use of the Hamming distance measure for classification in which the radius of generalization r of CC4 neural network plays an important role in defining the sphere of influence for each training sam-ple. If we rely only on the titles and summaries returned from standard search engines, it could be ap-propriate to represent the Web documents as binary vectors. However, when classifying real life docu-ments, binary representation of documents may not be an effective one and may reduce the classification precision. The paper presents a method to classify documents with their term frequency (TF) vector. In this method a way to transform the real value of each element to a binary number that required by CC4 is put forward. Usually, the dimensionality of the TF vector representation is very large. Therefore, be-fore transforming the real value of each element to a binary number a step called dimensionality reduc-tion, i.e., construction of indexes of much lower dimensionality called the k-index of documents will be performed. Then each k-index of documents is transformed to a 0/1 sequence. This kind of sequences should keep as much the original distance information of documents when measured within the Ham-ming distance space. Experimental results show that the CC4 performs better when CC4 uses our pro-posed method to classify news documents than it does when only depending on binary representation of documents.
关键词：CC4 neural network; document classification; dimensionality reduction