标题:“Clinical Stability” and Propensity Score Matching in Cardiac Surgery: is the clinical evaluation of treatment efficacy algorithmdependent in small sample size settings?
期刊名称:Epidemiology, Biostatistics and Public Health
印刷版ISSN:2282-0930
出版年度:2019
卷号:16
期号:1
页码:1-11
DOI:10.2427/13001
出版社:PREX
摘要:Background: Propensity score matching represents one of the most popular techniques to deal with treatment allocation
bias in observational studies. However, when the number of enrolled patients is very low, the creation of matched set
of subjects may highly depend on the model used to estimate individual propensity scores, undermining the stability of
consequential clinical findings. In this study, we investigate the potential issues related to the stability of the matched sets
created by different propensity score models and we propose some diagnostic tools to evaluate them.
Methods: Matched groups of patients were created using five different methods: Logistic Regression, Classification
and Regression Trees, Bagging, Random Forest and Generalized Boosted Model. Differences between subjects in
the matched sets were evaluated by comparing both pre-treatment covariates and propensity score distributions.
其他摘要:Background : Propensity score matching represents one of the most popular techniques to deal with treatment allocation bias in observational studies. However, when the number of enrolled patients is very low, the creation of matched set of subjects may highly depend on the model used to estimate individual propensity scores, undermining the stability of consequential clinical findings. In this study, we investigate the potential issues related to the stability of the matched sets created by different propensity score models and we propose some diagnostic tools to evaluate them. Methods : Matched groups of patients were created using five different methods: Logistic Regression, Classification and Regression Trees, Bagging, Random Forest and Generalized Boosted Model. Differences between subjects in the matched sets were evaluated by comparing both pre-treatment covariates and propensity score distributions. We applied our proposal to a cardio-surgical observational study that aims to compare two different procedures of cardiac valve replacement. Results : Both baseline characteristics and propensity score distributions were systematically different across matched samples of patients created with different models used to estimate propensity score. The most relevant differences were observed for the matched set created by estimating individual propensity scores with Classification and Regression Trees algorithm. Conclusion : Clinical stability of matched samples created with different statistical methods should always be evaluated to ensure reliability of final estimates. This work opens the door for future investigations that fully assess the implications of this finding.