首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Repeatable Research Infrastructure Enabling Administrative Data Analysis
  • 本地全文:下载
  • 作者:Daniel Thayer ; Muhammad Elmessary ; Daniel Mallory
  • 期刊名称:International Journal of Population Data Science
  • 电子版ISSN:2399-4908
  • 出版年度:2019
  • 卷号:4
  • 期号:3
  • 页码:1-1
  • DOI:10.23889/ijpds.v4i3.1268
  • 出版社:Swansea University
  • 其他摘要:Background/RationaleLinked administrative datasets offer great potential for research, but also present major challenges—including the preparation of operational data into a form suitable for efficient research, complex and computationally demanding analysis, and the need to capture and share information about dataset contents and research methods. Main AimThe analytical services team in the Secure Anonymised Information Linkage (SAIL) Databank is creating interconnected tools and systems to automate the preparation and analysis of research data and to curate information about datasets and research methods. Our underlying goal is to make linked data research orders of magnitude faster and cheaper, as well as improve its consistency and quality. MethodsSeveral key developments are ongoing: Automation of data quality checking. Management of dataset metadata. Processing of raw source datasets into cleaned, research-ready data assets. The Concept Library, an application for creating, using, and sharing knowledge about research definitions and methods. A suite of R packages for analysis. Web Application Programming Interfaces will allow these pieces to work together as an integrated system enabling efficient research. ResultsInitial versions of dataset quality checking, cleaned datasets, and R code to implement common tasks are already in day-to-day use by researchers within SAIL. An advisory group has been convened to help guide the work. For example, shared library code that flags conditions within health data has been used across multiple projects; a cleaned dataset measuring follow-up within primary care has been used by more than 100 projects. ConclusionOur proof-of-concept work demonstrates the ability of shared code and cleaned data to meet needs across multiple projects, saving effort and standardizing results. Ongoing work to develop and integrate these tools should further streamline the research process, increasing the output and public benefit of SAIL and other data sources.
国家哲学社会科学文献中心版权所有