Allocation Policies Matter for Hybrid Memory Systems

June 19, 2023

ai-systemshybrid-memorycxlmemory-management

Hybrid memory systems that combine fast DRAM with slower CXL-attached memory are becoming common in data centers. The standard approach is simple: allocate from DRAM until it fills up, then spill to CXL memory. This DRAM-preferred policy seems obviously correct — but it turns out to be suboptimal for many workloads.

Beyond DRAM-Preferred

Existing tiered memory systems use DRAM-preferred allocation, where pages go to high-performance DRAM until it is full, after which allocations fall through to lower-performing persistent memory. The implicit assumption is that all pages benefit equally from fast memory. In practice, application access patterns mean some pages are hot (frequently accessed) and others are cold (rarely touched). Blindly filling DRAM with the first pages allocated — regardless of their access frequency — wastes fast memory on cold data.

Three Allocation Policies

We designed, implemented, and evaluated three page allocation policies within a real system deployment of a state-of-the-art dynamic tiering system. The right allocation policy for a workload can lower access latencies for newly allocated pages by placing them in the appropriate memory tier from the start, rather than relying on slow migration to fix initial placement mistakes.

Practical Implications

For AI training workloads in particular — where memory access patterns during forward pass, backward pass, and optimizer steps differ significantly — the choice of allocation policy can meaningfully impact throughput. Systems that use DeepSpeed’s ZeRO optimizer with memory offloading are especially sensitive, since the optimizer states and gradients have distinct access patterns that benefit from intelligent initial placement.

Published at HPDC 2023 (32nd International Symposium on High-Performance Parallel and Distributed Computing). DOI: 10.1145/3588195.3595946