摘要:Diabetes is a chronic disease that affects millions of people worldwide. It is therefore unsurprising that there is a high volume of public discussions, resources, and research tackling various aspects of the disease. This study describes a new method for identifying areas of public interest in issues like diabetes and compares them to the topics being discussed in research. We tested our method by using posts from a popular diabetes discussion forum (DiabeticConnect), pages (articles) about diabetes published on Wikipedia, and the titles and abstracts of research articles about diabetes from the Scopus database. Tags assigned to each post in the discussion forum were used along with the post itself to compute a Labeled Latent Dirichlet Allocation (LLDA) model, which was then used to classify the Wikipedia pages and research articles. The resulting classifications were then used to compare the prevalence of the topics found in the discussion forum with that in the other two sources. The results show that the public interest in diabetes is not necessarily addressed by researchers. More importantly, the alignment and misalignment in the changes in relative interest over the various topics are evidence that LLDA modeling can be useful for comparing a public corpus, like a diabetes forum, and an academic one, like research article titles and abstracts. The success of using LLDA to classify research articles based on the tags assigned to posts in a public discussion forum shows that this a promising method for better understanding how the scientific community responds to public interests and needs.
关键词:topic modeling; public interests; public understanding of science; diabetes; wikipedia