期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2015
卷号:38
期号:3
出版社:IEEE Computer Society
摘要:A significant portion of data on the web is available on private or hidden databases that lie behind form-like query interfaces that allow users to browse these databases in a controlled manner. In this paper, wedescribe System HYDRA that enables fast sampling and data analytics over a hidden web database witha form-like web search interface. Broadly, it consists of three major components: (1) SAMPLE-GENwhich produces samples according to a given sampling distribution (2) SAMPLE-EVAL that evaluatessamples produced by SAMPLE-GEN and also generates estimations for a given aggregate query and (3)TIMBR that enables fast and easy construction of a wrapper that models both input and output interfaceof the web database thereby translating supported search queries to HTTP requests and retrieving top-kquery answers from HTTP responses.