Hybrid Memory for LLM Training

Investigation of LLM training and inference optimization using hybrid memory systems, evaluating how different allocation policies affect performance with DeepSpeed Zero optimizer offload mechanisms.

ai-systems archived

pytorchdeepspeedpython

Paper →

Hybrid Memory for LLM Training

This project explored optimization opportunities for Large Language Model training and inference workloads on hybrid memory systems combining DRAM with CXL-attached memory. We evaluated how different memory allocation policies interact with DeepSpeed Zero optimizer offload mechanisms.

Key Results

Demonstrated that allocation policy choice significantly impacts LLM training throughput on hybrid memory
Evaluated performance across different DeepSpeed Zero optimizer stages and offload configurations
Published at HPDC 2023 (32nd International Symposium on High-Performance Parallel and Distributed Computing)