期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2019
卷号:97
期号:12
页码:3488-3500
出版社:Journal of Theoretical and Applied
摘要:Todays technologies and advancements have led to eruption and floods of daily generated data. Raw data has no value if it�s not analyzed to extract the hidden insight for business organization. Big data is heterogeneous, unstructured, and enormous. Collecting, storing, manipulating, interpreting, analyzing and visualizing Big data shape the dimensions of Big Data life cycle. Big data deals most of time with unstructured data that require real time and batch processing. The goal of any big data platform is to extract correlations, hidden sentiments, patterns, values, and insights of these raw data. However, Big data analytics pipeline is end-to-end challenging. The paper objectives are of three-folds: Revisit the big data concept, dimensions and it is characteristics. Second, it aims to introducing Hadoop open source big data platform and the supportive utilities. Third, the paper aims to study the underlying challenges that surround Big data pipeline end to end.
关键词:Big Data; Big Data Pipeline; Big Data V�s; Hadoop Platform; Challenges