首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Enhanced Data Lake Clustering Design based on K-means Algorithm
  • 本地全文:下载
  • 作者:Jabrane Kachaoui ; Abdessamad Belangour
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2020
  • 卷号:11
  • 期号:4
  • DOI:10.14569/IJACSA.2020.0110472
  • 出版社:Science and Information Society (SAI)
  • 摘要:In recent years, Big Data requirements have evolved. Organizations are trying more than ever to accent their efforts on industrial development of all data at their disposal and move further away from underpinning technologies. After investing around Data Lake concept, organizations must now overhaul their data architecture to face IoT (Internet of Things) and AI (Artificial Intelligence) expansion. Efficient and effective data mapping treatments could serve in understanding the importance of data being transformed and used for decision-making process endorsement. As current relational databases are not able to manage large amounts of data, organizations headed towards NoSQL (Not only Structured Query Language) databases. One such known NoSQL database is MongoDB, which has a high scalability. This article mainly put forward a new data model able to extract, classify, and then map data for the purpose of generating new more structured data that meet organizational needs. This can be carried out by calculating various metadata attributes weights, which are considered as important information. It also processed on data clustering stored into MongoDB. This categorization based on data mining clustering algorithm named K-Means.
  • 关键词:Big data; Data Lake; NoSQL; MongoDB; K-means; metadata
国家哲学社会科学文献中心版权所有