首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:A Novel Page Links Prediction Technique for Web Search Sources
  • 本地全文:下载
  • 作者:Aleem Ansari ; Dr. Hemlata Vasishtha
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2015
  • 卷号:6
  • 期号:2
  • 页码:1865-1868
  • 出版社:TechScience Publications
  • 摘要:Search results (Data records) retrieved from web sources such as search engines or dynamic websites (e.g. online shopping) are usually scattered among different web pages. Each of these response pages displays fixed number of records ordered by certain search criteria. These response pages usually contain one or more hyperlinks that allows user to navigate to other response pages. Certain applications like web data extraction sometime needs to access only response pages that belong to certain search criteria. However current web crawlers cannot distinguish between related response pages and other pages from the single web source. In this paper we have proposed a simple and effective approach for identifying the URLs of the subsequent response pages from a web search source. Our approach takes the URLs of second and third response pages as input and generates the URLs of remaining pages as output. We have employed Myer's diff algorithm [1] for determining the differences between parameters in the input URLs. After identifying key parameters and their differences we construct URLs for remaining pages by assigning proper weight to key parameters.
  • 关键词:Web Search Source; Crawler; Data Record Detection;Information Extraction; Myer's diff algorithm; Web Content;Mining.
国家哲学社会科学文献中心版权所有