期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2009
卷号:9
期号:9
页码:168-175
出版社:International Journal of Computer Science and Network Security
摘要:The information on the WWW is growing at an exponential rate; therefore, search engines are required to index the downloaded Web documents more efficiently. Web mining techniques like clustering can be used for this purpose. In this paper, a novel technique to index the documents is being proposed that not only indexes the documents more efficiently but also uses hierarchical clustering to keep the information based upon similarity measure and fuzzy string matching. This technique keeps the related documents in the same cluster so that searching of documents becomes more efficient in terms of time complexity.