摘要:This paper describes the development of a SNOMED CT subset derived from clinical notes. A corpus of 44
million words of patient progress notes was drawn from the clinical information system of the Intensive Care
Service (ICS) at the Royal Prince Alfred Hospital, Sydney, Australia . This corpus was processed by a
variety of natural language processing procedures including the computation of all SNOMED CT candidate
codes. There are about 13 million concept instances comprising about 30,000 unique concept types detected
in the corpus. These instances have been processed by a tool which computes the closure of the minimal
sub-tree of concept types in the SNOMED hierarchy thus inferring the complete subset of SNOMED CT that
would be necessary for an intensive care unit. A subset of about 2700 concepts gives a coverage of 96% of
the corpus and the transitive closure uses less than 1% of SNOMED concepts and relationships. Use of this
subset will enable clinical information systems to efficiently deliver SNOMED CT terminology to the
presentation interface.