This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account.