首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Domain Specific Automatic Clustering of Web Pages for Search Engine
  • 本地全文:下载
  • 作者:Manika Goel ; Ankur Kumar Goel ; Lavita Kathuria
  • 期刊名称:International Journal of Computer Science & Technology
  • 印刷版ISSN:2229-4333
  • 电子版ISSN:0976-8491
  • 出版年度:2013
  • 卷号:4
  • 期号:4
  • 页码:204-206
  • 语种:English
  • 出版社:Ayushmaan Technologies
  • 摘要:Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major search engines. The proposed work suggests a new and efficient methodology for automatic clustering of web documents. Creating clusters for different documents makes searching easier and efficient. This technique can be utilized by search engines to provide relevant results to the user according to query. The proposed work maintains a cluster keyword file that contains the keywords or terms related to the documents of the cluster, the term frequency of the terms of the cluster keyword file is calculated in the new document and thus cosine similarity is measured between the new document and the doc’s of the cluster and if the similarity measure lies between the range of 0.75 to 0.82 the new document is assigned to the particular cluster. By this technique time consume for finding the appropriate cluster for a document will be reduced. This clustering algorithm works both online and offline. The proper clustering of documents will be further utilized by multi-document summarization system, which produces a summary for the documents related to each other.
  • 关键词:Cluster;Clustering;Word Matching;Cluster Keyword File;Cosine Similarity
国家哲学社会科学文献中心版权所有