期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2013
卷号:50
期号:3
出版社:Journal of Theoretical and Applied
摘要:In order to improve the clustering result of semi-structured texts, it needs to reduce the dimension and sparsity. To reduce the dimensions of semi-structured texts clustering, aimed at meta-data of semi-structured texts, we build the metadata feature vectors. Based on the domain concepts model, we build domain vector based on the domain concepts tree (set). With the help of the WordNet, we compute semantic similarity between the metadata feature vector and the domain vector. Finally, the clustering algorithm is designed to cluster semi-structured texts based on the semantic similarity between metadata feature vectors and domain vectors. The analysis shows that the clustering algorithm is feasible and has higher clustering accurate rate. It can ease the problem of lacking domain ontology and has the ability to improve the clustering quality.