摘要:This study examined a historical mixture model approach to the evaluation of ratings made in “gold standard” and two-rater 2×2 contingency tables. Peirce's 𝑖 and the derived 𝑖 average were discussed in relation to a widely used index of reliability in the behavioral sciences, Cohen's 𝜅. Sample size, population base rate of occurrence, the true “science of the method”, and guessing rates were manipulated across simulations. In “gold standard” situations, Peirce's 𝑖 tended to recover the true reliability of ratings as well as better than 𝜅. In two-rater situations, 𝑖ave tended to recover the true reliability as well as better than 𝜅 in most situations. The empirical utility and potential theoretical benefits of mixture model methods in estimating reliability are discussed, as are the associations between the 𝑖 statistics and other modern mixture model approaches.