
8.2K
Downloads
243
Episodes
Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science. Selecting papers by comparative results, citations and influence we educate you on the latest research. Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
Episodes

Monday Jan 02, 2023
Cramming: Training a Language Model on a Single GPU in One Day
Monday Jan 02, 2023
Monday Jan 02, 2023
Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. 2022: Jonas Geiping, T. Goldstein https://arxiv.org/pdf/2212.14034v1.pdf
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.