文章基本信息

标题：A review of approximate dynamic programming applications within military operations research
本地全文：下载
作者：M. Rempel ; J. Cai
期刊名称：Operations Research Perspectives
印刷版ISSN：2214-7160
电子版ISSN：2214-7160
出版年度：2021
卷号：8
页码：1-15
DOI：10.1016/j.orp.2021.100204
语种：English
出版社：Elsevier
摘要：AbstractSequences of decisions that occur under uncertainty arise in a variety of settings, including transportation, communication networks, finance, defence, etc. The classic approach to find an optimal decision policy for a sequential decision problem is dynamic programming; however its usefulness is limited due to the curse of dimensionality and the curse of modelling, and thus many real-world applications require an alternative approach. Within operations research, over the last 25 years the use of Approximate Dynamic Programming (ADP), known as reinforcement learning in many disciplines, to solve these types of problems has increased in popularity. These efforts have resulted in the successful deployment of ADP-generated decision policies for driver scheduling in the trucking industry, locomotive planning and management, and managing high-value spare parts in manufacturing. In this article we present the first review of applications of ADP within a defence context, specifically focusing on those which provide decision support to military or civilian leadership. This article’s main contributions are twofold. First, we review 18 decision support applications, spanning the spectrum of force development, generation, and employment, that use an ADP-based strategy and for each highlight how its ADP algorithm was designed, evaluated, and the results achieved. Second, based on the trends and gaps identified we discuss five topics relevant to applying ADP to decision support problems within defence: the classes of problems studied; best practices to evaluate ADP-generated policies; advantages of designing policies that are incremental versus complete overhauls when compared to currently practiced policies; the robustness of policies as scenarios change, such as a shift from high to low intensity conflict; and sequential decision problems not yet studied within defence that may benefit from ADP.
关键词：KeywordsSequential decision problemMarkov decision processApproximate dynamic programmingReinforcement learningMilitary