期刊名称:IOP Conference Series: Earth and Environmental Science
印刷版ISSN:1755-1307
电子版ISSN:1755-1315
出版年度:2019
卷号:242
期号:5
页码:1-7
DOI:10.1088/1755-1315/242/5/052038
出版社:IOP Publishing
摘要:This paper designs and implements the E-commerce big data analysis platform, which is mainly based on the data of commodities and sales of E-commerce sales platforms, providing a platform for intelligent analysis. The platform provides an interface for users to operate. It can efficiently obtain the necessary business information for e-commerce users and provide decision support. The system can be divided into four parts: crawler system, storage system, offline data analysis system and user interaction system. The storage system adopts the Hadoop ecosystem. Hadoop's HDFS are highly fault-tolerant, suitable for processing GB, TB, or even PB-level data, and can be expanded horizontally, and deployed on multiple inexpensive machines, the platform is Hadoop-based. HBase and Hive will handle and use data more efficiently. The offline analysis system adopts the Spark framework. That Spark is based on RDD (Elastic Distributed) set data processing. It can connect various data sources of HBase and Hive. Because the data are loaded into the memory of the cluster host, and quickly iterated, it is suitable for multiple rounds of computing tasks such as machine learning. Spark-SQL can provide SQL-like operations on structured or semi-structured data, which can greatly improve the efficiency of off-line data analysis.