首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Two Stage Intelligent Focus Crawler Using JavaScript Parser
  • 本地全文:下载
  • 作者:Revati Rajane ; Prof. Pradnya Kasture
  • 期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
  • 印刷版ISSN:2347-6710
  • 电子版ISSN:2319-8753
  • 出版年度:2017
  • 卷号:6
  • 期号:9
  • 页码:18711
  • DOI:10.15680/IJIRSET.2017.0609176
  • 出版社:S&S Publications
  • 摘要:The World Wide Web is a massive assemblage of billions of web pages containing terabytes of dataarranged in various servers using HTML. The all-purpose crawlers are challenged extensively at a fast pace from ascaling point of view because of the fast-paced evolution of the internet. A web crawler is a mechanized (automated)tool that traverses the web and extracts webpages for gathering information. In intelligent focus Web crawler, thecrawler starts with a specific defined topic and crawls the relevant webpages based on the defined search criteria. Inthis project, a new intelligent focus crawler has been proposed. A. The goal of the focused crawler is to identify andnotify pages based on the most relevance limiting the search scope to the boundaries of pages that are with the predecidedrelevance factors. This helps in reducing network and hardware resources, in turn leading to cost savings andimproves the efficiency and accuracy of the crawl data stored. For this purpose, it uses” Reverse Searching Strategy”.Keeping this aim in mind a two-level framework is used, for efficient searching and gathering of deep and hidden webinterfaces. In the first stage, it uses search engines to identify main pages which avoid visiting irrelevant pages. Afteridentifying the pages, the intelligent focus web crawler will prioritize the webpages to rank them to be more relevantthan the other based on the search topic. In the second stage, the crawler searches the insides of the websites forrelevant information based on the defined search criteria.HTML and JavaScript parser is developed to deal with thedynamic pages. Moreover, a report on crawled URLs is published after crawling which gives entries of all crawledURLs and errors found.
  • 关键词:Intelligent Crawler; focused crawler; weight table; World-Wide Web;Search Engine; links ranking;HTML Parser; JavaScript Parser.
国家哲学社会科学文献中心版权所有