期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2012
卷号:35
期号:3
出版社:IEEE Computer Society
摘要:Relational databases remain underused in the long tail of science, despite a number of significantsuccess stories and a natural correspondence between scientific inquiry and ad hoc database query.Barriers to adoption have been articulated in the past, but spreadsheets and other file-oriented ap-proaches still dominate. At the University of Washington eScience Institute, we are exploring a new"delivery vector" for selected database features targeting researchers in the long tail: a web-basedquery-as-a-service system called SQLShare that eschews conventional database design, instead empha-sizing a simple Upload-Query-Share work.ow and exposing a direct, full-SQL query interface over"raw" tabular data. We augment the basic query interface with services for cleaning and integratingdata, recommending and authoring queries, and automatically generating visualizations. We find thateven non-programmers are able to create and share SQL views for a variety of tasks, including qualitycontrol, integration, basic analysis, and access control. Researchers in oceanography, molecular biol-ogy, and ecology report migrating data to our system from spreadsheets, from conventional databases,and from ASCII files. In this paper, we will provide some examples of how the platform has enabled sci-ence in other domains, describe our SQLShare system, and propose some emerging research directionsin this space for the database community