首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Echo-CGC: A Communication-Efficient Byzantine-Tolerant Distributed Machine Learning Algorithm in Single-Hop Radio Network
  • 本地全文:下载
  • 作者:Qinzi Zhang ; Lewis Tseng
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2021
  • 卷号:184
  • 页码:7:1-7:16
  • DOI:10.4230/LIPIcs.OPODIS.2020.7
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:In the past few years, many Byzantine-tolerant distributed machine learning (DML) algorithms have been proposed in the point-to-point communication model. In this paper, we focus on a popular DML framework - the parameter server computation paradigm and iterative learning algorithms that proceed in rounds, e.g., [Gupta and Vaidya, 2020; El-Mhamdi et al., 2020; Chen et al., 2017]. One limitation of prior algorithms in this domain is the high communication complexity. All the Byzantine-tolerant DML algorithms that we are aware of need to send n d-dimensional vectors from worker nodes to the parameter server in each round, where n is the number of workers and d is the number of dimensions of the feature space (which may be in the order of millions). In a wireless network, power consumption is proportional to the number of bits transmitted. Consequently, it is extremely difficult, if not impossible, to deploy these algorithms in power-limited wireless devices. Motivated by this observation, we aim to reduce the communication complexity of Byzantine-tolerant DML algorithms in the single-hop radio network [Alistarh et al., 2010; Bhandari and Vaidya, 2005; Koo, 2004]. Inspired by the CGC filter developed by Gupta and Vaidya, PODC 2020 [Gupta and Vaidya, 2020], we propose a gradient descent-based algorithm, Echo-CGC. Our main novelty is a mechanism to utilize the broadcast properties of the radio network to avoid transmitting the raw gradients (full d-dimensional vectors). In the radio network, each worker is able to overhear previous gradients that were transmitted to the parameter server. Roughly speaking, in Echo-CGC, if a worker "agrees" with a combination of prior gradients, it will broadcast the "echo message" instead of the its raw local gradient. The echo message contains a vector of coefficients (of size at most n) and the ratio of the magnitude between two gradients (a float). In comparison, the traditional approaches need to send n local gradients in each round, where each gradient is typically a vector in a ultra-high dimensional space (d ≫ n). The improvement on communication complexity of our algorithm depends on multiple factors, including number of nodes, number of faulty workers in an execution, and the cost function. We numerically analyze the improvement, and show that with a large number of nodes, Echo-CGC reduces 80% of the communication under standard assumptions.
  • 关键词:Distributed Machine Learning; Single-hop Radio Network; Byzantine Fault; Communication Complexity; Wireless Communication; Parameter Server
国家哲学社会科学文献中心版权所有