Hybrid Memory for LLM Training
Investigation of LLM training and inference optimization using hybrid memory systems, evaluating how different allocation policies affect performance with DeepSpeed Zero optimizer offload mechanisms.
ai-systems archived
pytorchdeepspeedpython
Hybrid Memory for LLM Training
This project explored optimization opportunities for Large Language Model training and inference workloads on hybrid memory systems combining DRAM with CXL-attached memory. We evaluated how different memory allocation policies interact with DeepSpeed Zero optimizer offload mechanisms.
Key Results
- Demonstrated that allocation policy choice significantly impacts LLM training throughput on hybrid memory
- Evaluated performance across different DeepSpeed Zero optimizer stages and offload configurations
- Published at HPDC 2023 (32nd International Symposium on High-Performance Parallel and Distributed Computing)