出版社:Academy of Economic Studies - Bucharest, Romania
摘要:This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.