首页    期刊浏览 2024年09月21日 星期六
登录注册

文章基本信息

  • 标题:Leveraging High Performance Computing for Managing Large and Evolving Data Collections
  • 本地全文:下载
  • 作者:Arora, Ritu ; Esteva, Maria ; Trelogan, Jessica
  • 期刊名称:International Journal of Digital Curation
  • 印刷版ISSN:1746-8256
  • 出版年度:2014
  • 卷号:9
  • 期号:2
  • 页码:17-27
  • DOI:10.2218/ijdc.v9i2.331
  • 语种:English
  • 出版社:University of Edinburgh
  • 摘要:The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.
国家哲学社会科学文献中心版权所有