期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:2
页码:1613-1615
出版社:TechScience Publications
摘要:Text clustering extends over wide range of applications from information retrieval system, pattern recognition, search engines to social networks, and other digital collections. Text data involved in such applications usually have ample of unused data associated with them. The paper focuses on handling this unused data, referred as supplementary information, to generate effective clusters. The supplementary information may include document provenance information, links in a document, index terms used within a document or any other data that is not generally used for clustering. In this paper, we perform document clustering using supplementary information along with the content for generating clusters with higher purity. We also identify the use of such supplementary information for clustering in applications involving other file types like audio, image, video, etc. The clustering performance may degrade if the supplementary information associated with pure content is noisy. Taking this into consideration, we use partitioningbased clustering algorithm and a probabilistic model. We present experimental result to justify the approach