首页    期刊浏览 2025年02月20日 星期四
登录注册

文章基本信息

  • 标题:An Improved Extraction Algorithm from Domain Specific Hidden Web
  • 本地全文:下载
  • 作者:Juhi Sharma ; Mukesh Rawat
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2014
  • 卷号:5
  • 期号:6
  • 页码:8239-8242
  • 出版社:TechScience Publications
  • 摘要:The web contains a large amount of information which is increasing by magnitude every day. The World Wide Web consists of Surface Web (Publicly Indexed Web) and the Deep Web which consists of Hidden Data, also- referred to by different names such as Hidden Web, Deepnet or the Invisible Web. A user can directly access the surface web through a Search Engine but to access the hidden data/information, the users have to manually feed a set of keywords in a typical search interface to access these hidden web pages from source web sites. The problem area we are working on is devising efficient mechanisms to extract this information automatically beforehand since "crawlers" cannot access it otherwise. In this paper we present a mechanism to extract search forms from HTML pages spread over the web, automatic filling and submission of those forms at their source sites to download the Hidden Web pages in a repository for further use by web crawlers.
  • 关键词:Hidden Web; Query Interface; Data Mining
国家哲学社会科学文献中心版权所有