期刊名称:International Journal of Distributed and Parallel Systems
印刷版ISSN:2229-3957
电子版ISSN:0976-9757
出版年度:2017
卷号:8
期号:3
页码:1
DOI:10.5121/ijdps.2017.8301
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:The paper aims at proposing a solution for designing and developing a seamless automation andintegration of machine learning capabilities for Big Data with the following requirements: 1) the ability toseamlessly handle and scale very large amount of unstructured and structured data from diversified andheterogeneous sources; 2) the ability to systematically determine the steps and procedures needed foranalyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processingcomponent; 3) the ability to automatically select the most appropriate libraries and tools to compute andaccelerate the machine learning computations; and 4) the ability to perform Big Data analytics with highlearning performance, but with minimal human intervention and supervision. The whole focus is to providea seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequencyand high-dimensional features from different types of data characteristics and differentapplication problem domains, with high accuracy, robustness, and scalability. This paper highlights theresearch methodologies and research activities that we propose to be conducted by the Big Dataresearchers and practitioners in order to develop and support seamless automation and integration ofmachine learning capabilities for Big Data analytics.
关键词:Big Data; Machine Learning for Big Data; High-frequency Machine Learning; High-Dimension Machine;Learning