摘要:Objectives. We determined whether statistical text mining (STM) can identify fall-related injuries in electronic health record (EHR) documents and the impact on STM models of training on documents from a single or multiple facilities. Methods. We obtained fiscal year 2007 records for Veterans Health Administration (VHA) ambulatory care clinics in the southeastern United States and Puerto Rico, resulting in a total of 26 010 documents for 1652 veterans treated for fall-related injury and 1341 matched controls. We used the results of an STM model to predict fall-related injuries at the visit and patient levels and compared them with a reference standard based on chart review. Results. STM models based on training data from a single facility resulted in accuracy of 87.5% and 87.1%, F-measure of 87.0% and 90.9%, sensitivity of 92.1% and 94.1%, and specificity of 83.6% and 77.8% at the visit and patient levels, respectively. Results from training data from multiple facilities were almost identical. Conclusions. STM has the potential to improve identification of fall-related injuries in the VHA, providing a model for wider application in the evolving national EHR system. Approximately one third of all adults older than 65 years fall each year. 1 Fall injury is a leading cause of death and disability among older adults. 2 Adults aged 65 years and older had more than 2.1 million emergency department (ED) visits from injurious falls in 2006, accounting for 1 in 10 of all ED visits nationally. Direct-care costs of fall injuries in the United States for people aged 65 years and older are estimated to be approximately $20 billion annually. 3 A fall the previous year is the strongest clinical predictor of subsequent falls and should target patients for fall prevention programs. 4 Although most estimates of treatment of fall-related injuries have come from hospital ED data, a recent national survey estimated that treatment of more than 50% of 76 million nonfatal acute injuries (most of which were fall injuries) occurred in ambulatory care settings outside of hospital EDs. 5 With the evolution of the electronic health record (EHR), new opportunities to measure the impact of this important public health issue will be available. In this article, we describe results of using statistical text mining (STM) of clinical documents from an integrated EHR to improve the identification of fall-related injuries in ambulatory care. The Veterans Health Administration’s (VHA’s) EHR supports both ambulatory and inpatient care and allows full management of the health record nationally. The EHR connects VHA facilities’ workstations and PCs through the Computerized Patient Record System, a graphical user interface that allows full management of the health record. 6 Services or encounters with patients are documented in the EHR in 2 ways: structured or coded data using International Classification of Diseases, Ninth Revision, Clinical Modification ( ICD-9-CM ) 7 codes and written notes, which are entered directly by the provider and saved as separate files. It is the largest integrated EHR in the United States, 8 and it contains tremendous amounts of administrative data and approximately 2.5 billion text-based documents (e.g., progress notes, lab reports). 9 Previously, we described patterns of fall-related ambulatory care encounters in the VHA administrative data from the EHR. 10 The current study was based on the analysis of fall-related E-codes ( ICD-9-CM codes E880–E889), part of the ICD-9-CM coding system that permits the “supplementary classification of external causes of injury and poisoning.” 11 (p 81) For example, the primary diagnosis code for an encounter might be “fracture of the neck of the femur” ( ICD-9-CM 820) with the fall-related E-code for “fall from a ladder” ( ICD-9-CM E888.1). Although nearly half of the encounters occurred in the emergency or urgent care setting, fall-related injuries led to services across a spectrum of medical and surgical providers and departments. 10 A single-institution study demonstrated that STM could be used to identify fall-related injuries in VHA ambulatory care documents when no E-code was present. 12 We conducted a multi-institutional study that used STM to identify fall-related injuries at both the outpatient visit level (≥ 1 outpatient encounter in a given day) and patient level (across 1 year of data) and extended previous work to determine the best STM model for identifying fall-related injuries at the document level. 13 We explored whether STM can be successfully applied to documents generated across a large health care system. The evidence that practice patterns vary across facilities in large systems is considerable, and if documentation practices also vary, training on a sample from 1 facility would have an impact on the results. Because STM analysis requires a large reference set of documents that have been reviewed and classified by expert human review, which can be very costly, we also investigated the effect on STM results of selecting documents from a single or multiple institutions.