首页    期刊浏览 2025年08月03日 星期日
登录注册

文章基本信息

  • 标题:Lake Data Warehouse Architecture for Big Data Solutions
  • 本地全文:下载
  • 作者:Emad Saddad ; Ali El-Bastawissy ; Hoda M. O. Mokhtar
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2020
  • 卷号:11
  • 期号:8
  • DOI:10.14569/IJACSA.2020.0110854
  • 出版社:Science and Information Society (SAI)
  • 摘要:Traditional Data Warehouse is a multidimensional repository. It is nonvolatile, ‎subject-oriented, integrated, time-variant, and non-‎operational data. It is gathered from multiple ‎heterogeneous data ‎sources. We need to adapt traditional Data Warehouse architecture to deal with the new ‎challenges imposed by the abundance of data and the current big data characteristics, containing ‎volume, value, variety, validity, volatility, visualization, variability, and venue. The new ‎architecture also needs to handle existing drawbacks, including availability, scalability, and ‎consequently query performance. This paper introduces a novel Data Warehouse architecture, named Lake ‎Data Warehouse Architecture, to provide the traditional Data Warehouse with the capabilities to ‎overcome the challenges. ‎Lake Data Warehouse Architecture depends on merging the traditional Data Warehouse architecture ‎with big data technologies, like the Hadoop framework and Apache Spark. It provides a hybrid ‎solution in a complementary way. The main advantage of the proposed architecture is that it ‎integrates the current features in ‎traditional Data Warehouses and big data features acquired ‎through integrating the ‎traditional Data Warehouse with Hadoop and Spark ecosystems. Furthermore, it is ‎tailored to handle a tremendous ‎volume of data while maintaining availability, reliability, and ‎scalability.‎
  • 关键词:Traditional data warehouse; big data; semi-structured data; unstructured data; novel data warehouses architecture; Hadoop; spark
国家哲学社会科学文献中心版权所有