期刊名称:IEEE Transactions on Emerging Topics in Computing
印刷版ISSN:2168-6750
出版年度:2017
卷号:5
期号:4
页码:450-465
DOI:10.1109/TETC.2015.2497143
出版社:IEEE Publishing
摘要:As big data spreads rapidly, performance problems in these systems become common concerns. As the first line of defending these problems, performance diagnosis plays an essential role in big data systems. It is notoriously difficult to conduct performance diagnosis in large distributed systems. Previous work either pinpoint the root causes by instrumenting the applications or runtime systems in a white-box way, which leads to a considerable overhead, or just provide some hints to the hidden root causes in a black-box way. Very few works focus on pinpointing the real root causes in a black-box way. To address this problem, this paper proposes a black-box invariant-based diagnosing approach and implements a proof-of-concept system named InvarNet-X. In this paper, performance diagnosis is formalized as a pattern recognition problem, meaning that each performance problem is identified by a specific pattern. The rationale of InvarNet-X is that the unobservable root causes of performance problems always expose themselves through the violations of the associations among directly observable performance metrics. Such observable associations are called likely invariants calculated by the maximal information criterion, and each performance problem is signified by a sparse distributed representation. A problem signature database is constructed by training multiple real performance problems in advance. Once a performance anomaly is detected, the diagnosing procedure is triggered. The root cause is pinpointed by retrieving similar signatures in the signature database. The experimental evaluations in a controlled big data system show that InvarNet-X can achieve a high accuracy in diagnosing some real performance problems reported in software bug repositories, which is superior to several state-of-the-art approaches. Moreover, the light-weight property makes InvarNet-X easily facilitated in large-scale big data systems in real time.
关键词:Big data;Hadoop;invariant;maximal information criterion;performance diagnosis