
8.2K
Downloads
243
Episodes
Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
Episodes

Friday Dec 02, 2022
TorchScale: Transformers at Scale
Friday Dec 02, 2022
Friday Dec 02, 2022
Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present T ORCH S CALE , an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. T ORCH S CALE has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that T ORCH S CALE can successfully scale Transformers to different sizes without tears. 2022: Shuming Ma, Hong-Yi Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei https://arxiv.org/pdf/2211.13184v1.pdf
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.