摘要:Fabien Gaud, Simon Fraser University Baptiste Lepers, CNRS Justin Funston, Simon Fraser University Mohammad Dashti, Simon Fraser University Alexandra Fedorova, University of British Columbia Vivien Quéma, Grenoble INP Renaud Lachaize, UJF Mark Roth, Simon Fraser University Modern server-class systems are typically built as several multicore chips put together in a single system. Each chip has a local DRAM (dynamic random-access memory) module; together they are referred to as a node. Nodes are connected via a high-speed interconnect, and the system is fully coherent. This means that, transparently to the programmer, a core can issue requests to its node's local memory as well as to the memories of other nodes. The key distinction is that remote requests will take longer, because they are subject to longer wire delays and may have to jump several hops as they traverse the interconnect. The latency of memory-access times is hence non-uniform, because it depends on where the request originates and where it is destined to go. Such systems are referred to as NUMA (non-uniform memory access). Systems with NUMA characteristics were built as early as the 1980s, and along with the hardware operating system, support for NUMA has evolved. Modern NUMA systems are quite different from the old ones, so we must revisit our assumptions about them and rethink how to build NUMA-aware operating systems. This article evaluates performance characteristics of a representative modern NUMA system, describes NUMA-specific features in Linux, and presents a memory-management algorithm that delivers substantially reduced memory-access times and better performance.