摘要:Verbal autopsy (VA) is an important method for obtaining cause of death information in settings without vital registration and medical certification of causes of death. An array of methods, including physician review and computer-automated methods, have been proposed and used. Choosing the best method for VA requires the appropriate metrics for assessing performance. Currently used metrics such as sensitivity, specificity, and cause-specific mortality fraction (CSMF) errors do not provide a robust basis for comparison. We use simple simulations of populations with three causes of death to demonstrate that most metrics used in VA validation studies are extremely sensitive to the CSMF composition of the test dataset. Simulations also demonstrate that an inferior method can appear to have better performance than an alternative due strictly to the CSMF composition of the test set. VA methods need to be evaluated across a set of test datasets with widely varying CSMF compositions. We propose two metrics for assessing the performance of a proposed VA method. For assessing how well a method does at individual cause of death assignment, we recommend the average chance-corrected concordance across causes. This metric is insensitive to the CSMF composition of the test sets and corrects for the degree to which a method will get the cause correct due strictly to chance. For the evaluation of CSMF estimation, we propose CSMF accuracy. CSMF accuracy is defined as one minus the sum of all absolute CSMF errors across causes divided by the maximum total error. It is scaled from zero to one and can generalize a method's CSMF estimation capability regardless of the number of causes. Performance of a VA method for CSMF estimation by cause can be assessed by examining the relationship across test datasets between the estimated CSMF and the true CSMF. With an increasing range of VA methods available, it will be critical to objectively assess their performance in assigning cause of death. Chance-corrected concordance and CSMF accuracy assessed across a large number of test datasets with widely varying CSMF composition provide a robust strategy for this assessment.