Distributed DL Training Analysis
Performance and consistency analysis of distributed deep learning training methods across parameter server architectures, providing deployment guidelines for accuracy-throughput tradeoffs.
ai-systems archived
mxnetpython
Distributed DL Training Analysis
This project investigated how different distributed deep neural network training methods — using parameter server architectures — affect both training accuracy and throughput. We studied the tradeoffs between different parameter distribution strategies and provided guidelines for deploying applications under various configurations.
Key Results
- Systematic comparison of distributed DNN training methods using parameter server architectures
- Quantified accuracy-throughput tradeoffs across different parameter distribution strategies
- Provided practical deployment guidelines for choosing training configurations
- Published at IEEE IPCCC 2020 (39th International Performance Computing and Communications Conference)