期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:5
页码:6578-6582
出版社:TechScience Publications
摘要:Most of the common techniques of text mining are based on the statistical analysis of the term frequency. The statistical analysis of the term frequency captures the importance of the term within the document only. An alternate approach would be to enhance the mining model to include the contribution of the term to the semantics of the text so that the terms that capture the concepts of the document and thereby the similarity between the documents may be found. The contribution of each term to the semantics at the sentence, document and corpus levels is determined using sentence-based concept-analysis, document-based concept-analysis and corpus-based concept-analysis respectively. A concept-based similarity measure is used to determine the similarity between the documents.This paper introduces the Concept-based similaritymeasure into the Temporal -Semantic Clustering model for eventdetection in newspaper articles. The document similarity functionis defined in terms of two similarity measures. Initially a context based similarity measure that uses the vector of weighted terms is used to determine the similarity between the documents. Lateranother partial similarity measure that uses the vector of weightedtime entities along with the previously determined concept-basedmeasure to determine the combined similarity measure. Hierarchical approach is used to cluster documents based on thissimilarity measure