首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:A Two layer Semi-Spervised Clustering Method for Text Retrieval
  • 本地全文:下载
  • 作者:Mohammad Darvishi Padook ; Eghbal Mansoori ; Reza Boostani
  • 期刊名称:International Journal of Computer Technology and Applications
  • 电子版ISSN:2229-6093
  • 出版年度:2012
  • 卷号:3
  • 期号:6
  • 页码:1971-1978
  • 出版社:Technopark Publications
  • 摘要:Accurate clustering of text is a challenging problem among the information retrieval society. In some cases experts possesses prior knowledge about the data that can enhance the clustering performance. In this paper a two layer semi-supervised clustering method is proposed to improve the text clustering accuracy. The novel approach uses Space Level Constraints Clustering (SLCC) method as a first layer to categorize the data which novel the prior knowledge for the second layer. K-means clustering is an efficient method but the bottleneck of this algorithm is its sensitivity to the number of clusters and initial centers. K-means is employed as the second layer in the proposed structure and its drawbacks is solved by incorporating prior knowledge found by SLCC (in the first layer) such as number of partitions and their centers. Here Reuters-21578 dataset along with some standard sets from UCI repository are selected as a rich benchmark to evaluate our method. Therefore, accuracy of the clustering methods can be precisely determined. The combinatorial scheme is applied on a high dimensional reuters-21578 data and the clustering results lead to a higher accuracy compare to utilize just SLCC or K-means on the data set and also got high improvement on the other datasets.
  • 关键词:space level constraints clustering; K-means; text retrieval; semi-supervised clustering
国家哲学社会科学文献中心版权所有