Aiming at a simple, neat redesign of distributed deep learning frameworks for various parallelism paradigms, we present OneFlow , a novel distributed training framework based on an SBP ( split , broadcast and partial-value ) abstraction and the actor model. SBP enables much easier programming of data parallelism and model parallelism than existing frameworks, and the actor model provides a succinct runtime mechanism to manage the complex dependencies imposed by resource constraints, data movement and computation in distributed deep learning. 2021: J. Yuan, Xinqi Li, Cheng Cheng, Juncheng Liu, Ran Guo, Shenghang Cai, Chi Yao, Fei Yang, Xiaodong Yi, Chuan Wu, Haoran Zhang, Jie Zhao https://arxiv.org/pdf/2110.15032v6.pdf
Version: 20240320
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.