文章基本信息

标题：Optimal Policy Learning for Disease Prevention Using Reinforcement Learning
本地全文：下载
作者：Zahid Alam Khan ; Zhengyong Feng ; M. Irfan Uddin 等
期刊名称：Scientific Programming
印刷版ISSN：1058-9244
出版年度：2020
卷号：2020
页码：1-13
DOI：10.1155/2020/7627290
出版社：Hindawi Publishing Corporation
摘要：Diseases can have a huge impact on the quality of life of the human population. Humans have always been in the quest to find strategies to avoid diseases that are life-threatening or affect the quality of life of humans. Effective use of resources available to human to control different diseases has always been critical. Researchers are recently more interested to find AI-based solutions to control the human population from diseases due to the overwhelming popularity of deep learning. There are many supervised techniques that have always been applied for disease diagnosis. However, the main problem of supervised based solutions is the availability of data, which is not always possible or not always complete. For instance, we do not have enough data that shows the different states of humans and different states of environments, and how all different actions taken by humans or viruses have ultimately resulted in a disease that eventually takes the lives of humans. Therefore, there is a need to find unsupervised based solutions or some techniques that do not have a dependency on the underlying dataset. In this paper, we have explored the reinforcement learning approach. We have tried different reinforcement learning algorithms to research different solutions for the prevention of diseases in the simulation of the human population. We have explored different techniques for controlling the transmission of diseases and its effects on health in the human population simulated in an environment. Our algorithms have found out policies that are best for the human population to protect themselves from the transmission and infection of malaria. The paper concludes that deep learning-based algorithms such as Deep Deterministic Policy Gradient (DDPG) have outperformed traditional algorithms such as Q-Learning or SARSA.