摘要:SummaryWe proposeCX-ToM, short for counterfactual explanations with theory-of-mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e., dialogue between the machine and human user. More concretely, our CX-ToM framework generates a sequence of explanations in a dialogue by mediating the differences between the minds of the machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling the human’s intention, the machine’s mind as inferred by the human, as well as human's mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention-based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations calledfault-lineswhich we define as follows: given an input imageIfor which a CNN classification modelMpredicts classcpred, a fault-line identifies the minimal semantic-level features (e.g.,stripeson zebra), referred to as explainable concepts, that need to be added to or deleted fromIto alter the classification category ofIbyMto another specified classcalt. Extensive experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art XAI models.Graphical abstractDisplay OmittedHighlights•Attention is not a Good Explanation•Explanation is an Interactive Communication Process•We introduce a new XAI framework based on Theory-of-Mind and counterfactual explana- tions.Computer science; Artificial intelligence; Human-computer interaction