期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2013
卷号:3
期号:4
出版社:S.S. Mishra
摘要:The paper explore the problems identification, objectives, architecture of prefetching of hidden web pages using (DSIM). In this paper, a novel framework for interface matching is being proposed. Large amount of on -line information inside on the hidden web (deep or invisible web) that is not accessed by every user. These web pages are generated dynamically from databases demand on user query. These such pages not indexed by a static URL, it generated only when a user fire query and result displayed result. Now a day's many different matching solutions have been proposed so far. The rapid growth in the amount of information and the number of users has led to difficulty in providing effective search services for the web users and increased web latency; resulting in decreased web performance. Most of the search engines deal with surface Web only, the set of Web pages directly accessible through hyperlinks, mostly ignoring the vast amount of information hidden behind forms, which composes by the hidden Web. As compared to the Surface Web, the hidden Web co ntains a much larger amount of high-quality information hidden behind the databases. This framework extracts hidden web pages by accruing benefits of its unique features: 1) automatic downloading of matching keyword and prefetching related data from hidden web databases, 2) identification of fuzzy mappings between search interface elements by using a novel approach called DSIM (Domain-specific Interface Mapper), and 3) the capability to automatic filling of search interfaces. The effectiveness of proposed framework has been evaluated through experiments using real web sites and encouraging preliminary results were obtained.
关键词:DSIM; Hidden web; Search engine; Fuzzy matching and Spiders