期刊名称:International Journal on Computer Science and Engineering
印刷版ISSN:2229-5631
电子版ISSN:0975-3397
出版年度:2010
卷号:2
期号:1
页码:80-83
出版社:Engg Journals Publications
摘要:This paper compares Neural network Approach with N-gram approach, for text categorization, and demonstrates that Neural Network approach is similar to the N-gram approach but with much less judging time. Both methods demonstrated here are aimed at language identification. The presence of particular characters, words and the statistical information of word lengths are used as a feature vector. In an identification experiment with Asian languages the neural network approach achieved 98% correct classification rate with 500 bytes, but it is five times faster than n-gram based approach.
关键词:N-Gram; Neural Network; LanguageIdentification; Text categorization