期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2014
卷号:60
期号:1
出版社:Journal of Theoretical and Applied
摘要:Clustering techniques are often used to cluster grouping text documents. Modeling and graph-based representation of the document clustering process can be done by using algorithms Document Index Graph (DIG). This study aims to implement the DIG algorithm for designing the structure digraphs used for graphical representation of web document clustering process. The data used is the REUTERS-21578 documents. Testing is done by determining the parameter values for the number of groups of documents to be processed and the determination of the frequency of occurrence of the word limit. Analysis performed on the stage of determining the limit frequency of occurrence of relevant words (inter-cluster) and the occurrence of the word that is not relevant (intra-cluster) on the document clustering process. Digraph structure that represents the best graph for document clustering process is achieved in inter-cluster frequency value 5 and the value of intra-cluster frequency 3 within 25 documents.
关键词:Algorithm; Clustering; Digraph; Document Index Graph; Reuters Document