摘要:AbstractSequences of decisions that occur under uncertainty arise in a variety of settings, including transportation, communication networks, finance, defence, etc. The classic approach to find an optimal decision policy for a sequential decision problem is dynamic programming; however its usefulness is limited due to the curse of dimensionality and the curse of modelling, and thus many real-world applications require an alternative approach. Within operations research, over the last 25 years the use of Approximate Dynamic Programming (ADP), known as reinforcement learning in many disciplines, to solve these types of problems has increased in popularity. These efforts have resulted in the successful deployment of ADP-generated decision policies for driver scheduling in the trucking industry, locomotive planning and management, and managing high-value spare parts in manufacturing. In this article we present the first review of applications of ADP within a defence context, specifically focusing on those which provide decision support to military or civilian leadership. This article’s main contributions are twofold. First, we review 18 decision support applications, spanning the spectrum of force development, generation, and employment, that use an ADP-based strategy and for each highlight how its ADP algorithm was designed, evaluated, and the results achieved. Second, based on the trends and gaps identified we discuss five topics relevant to applying ADP to decision support problems within defence: the classes of problems studied; best practices to evaluate ADP-generated policies; advantages of designing policies that are incremental versus complete overhauls when compared to currently practiced policies; the robustness of policies as scenarios change, such as a shift from high to low intensity conflict; and sequential decision problems not yet studied within defence that may benefit from ADP.