期刊名称:International Journal of Population Data Science
电子版ISSN:2399-4908
出版年度:2020
卷号:5
期号:1
页码:1-9
DOI:10.23889/ijpds.v5i1.1346
出版社:Swansea University
摘要:Introduction The challenges in identifying a cohort of people with a rare condition can be addressed by routinely collected, population-scale electronic health record (eHR) data, which provide large volumes of data at a national level. This paper describes the challenges of accurately identifying a cohort of children with Cystic Fibrosis (CF) using eHR and their validation against the UK CF Registry. Objectives To establish a proof of principle and provide insight into the merits of linked data in CF research and the benefits of access to multiple data sources. In particular, the UK CF Registry data, and to demonstrate the opportunity it represents as a resource for future CF research. Method Three eHR data sources were used to identify children with CF born in Wales between 1st January 1998 and 31st August 2015 within the Secure Anonymised Information Linkage (SAIL) Databank. The UK CF Registry was later acquired by SAIL and linked to the eHR cohort to validate the cases and explore the reasons for misclassifications. Results We identified 352 children with CF in the three eHR data sources. This was greater than expected based on historical incidence rates in Wales. Subsequent validation using the UK CF Registry found that 257 (73%) of these were true cases. Over 98% of individuals identified as CF cases in all three eHR data sources were confirmed as true cases; but this was only the case for 19.8% of those identified in a single data source. Conclusion Identifying health conditions in eHR data can be challenging, so data quality assurance and validation is important or the merit of the research is undermined. This retrospective review identifies some of the challenges in identifying CF cases and demonstrates the benefits of linking cases across multiple data sources to improve quality.