文章基本信息

标题：Domain Specific Automatic Clustering of Web Pages for Search Engine
本地全文：下载
作者：Manika Goel ; Ankur Kumar Goel ; Lavita Kathuria 等
期刊名称：International Journal of Computer Science & Technology
印刷版ISSN：2229-4333
电子版ISSN：0976-8491
出版年度：2013
卷号：4
期号：4
页码：204-206
语种：English
出版社：Ayushmaan Technologies
摘要：Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major search engines. The proposed work suggests a new and efficient methodology for automatic clustering of web documents. Creating clusters for different documents makes searching easier and efficient. This technique can be utilized by search engines to provide relevant results to the user according to query. The proposed work maintains a cluster keyword file that contains the keywords or terms related to the documents of the cluster, the term frequency of the terms of the cluster keyword file is calculated in the new document and thus cosine similarity is measured between the new document and the doc’s of the cluster and if the similarity measure lies between the range of 0.75 to 0.82 the new document is assigned to the particular cluster. By this technique time consume for finding the appropriate cluster for a document will be reduced. This clustering algorithm works both online and offline. The proper clustering of documents will be further utilized by multi-document summarization system, which produces a summary for the documents related to each other.
关键词：Cluster;Clustering;Word Matching;Cluster Keyword File;Cosine Similarity