Hybrid Memory for LLM Training

Investigation of LLM training and inference optimization using hybrid memory systems, evaluating how different allocation policies affect performance with DeepSpeed Zero optimizer offload mechanisms.

ai-systems archived
pytorchdeepspeedpython

Hybrid Memory for LLM Training

This project explored optimization opportunities for Large Language Model training and inference workloads on hybrid memory systems combining DRAM with CXL-attached memory. We evaluated how different memory allocation policies interact with DeepSpeed Zero optimizer offload mechanisms.

Key Results

  • Demonstrated that allocation policy choice significantly impacts LLM training throughput on hybrid memory
  • Evaluated performance across different DeepSpeed Zero optimizer stages and offload configurations
  • Published at HPDC 2023 (32nd International Symposium on High-Performance Parallel and Distributed Computing)