期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:3
页码:2691-2697
出版社:TechScience Publications
摘要:Most of the common techniques of text mining are based on the statistical analysis of the term frequency. The statistical analysis of the term frequency captures the importance of the term within the document only. An alternate approach would be to enhance the mining model to include the contribution of the term to the semantics of the text so that the terms that capture the concepts of the document and thereby the similarity between the documents may be found. The contribution of each term to the semantics at the sentence, document and corpus levels is determined using sentence-based concept-analysis, document-based concept-analysis and corpus-based concept-analysis respectively. A concept-based similarity measure is used to determine the similarity between the documents.