摘要:We analyzed the COVID-19 Open Research Dataset (CORD-19) to understand leading research institutions, collaborations among institutions, major publication venues, key research concepts, and topics covered by pandemic-related research. Methods: We conducted a descriptive analysis of authors' institutions and relationships, automatic content extraction of key words and phrases from titles and abstracts, and topic modeling and evolution. Data visualization techniques were applied to present the results of the analysis. Results: We found that leading research institutions on COVID-19 included the Chinese Academy of Sciences, the US National Institutes of Health, and the University of California. Research studies mostly involved collaboration among different institutions at national and international levels. In addition to bioRxiv, major publication venues included journals such as The BMJ, PLOS One, Journal of Virology, and The Lancet. Key research concepts included the coronavirus, acute respiratory impairments, health care, and social distancing. The ten most popular topics were identified through topic modeling and included human metapneumovirus and livestock, clinical outcomes of severe patients, and risk factors for higher mortality rate. Conclusion: Data analytics is a powerful approach for quickly processing and understanding large-scale datasets like CORD-19. This approach could help medical librarians, researchers, and the public understand important characteristics of COVID-19 research and could be applied to the analysis of other large datasets. .
关键词:COVID-19 pandemic;CORD-19 dataset;global research roadmap;data analytics;topic modeling