期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:6
页码:8239-8242
出版社:TechScience Publications
摘要:The web contains a large amount of information which is increasing by magnitude every day. The World Wide Web consists of Surface Web (Publicly Indexed Web) and the Deep Web which consists of Hidden Data, also- referred to by different names such as Hidden Web, Deepnet or the Invisible Web. A user can directly access the surface web through a Search Engine but to access the hidden data/information, the users have to manually feed a set of keywords in a typical search interface to access these hidden web pages from source web sites. The problem area we are working on is devising efficient mechanisms to extract this information automatically beforehand since "crawlers" cannot access it otherwise. In this paper we present a mechanism to extract search forms from HTML pages spread over the web, automatic filling and submission of those forms at their source sites to download the Hidden Web pages in a repository for further use by web crawlers.