Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1750 | 2023 |
ZeRO: Memory optimizations toward training trillion parameter models S Rajbhandari, J Rasley, O Ruwase, Y He SC20: International Conference for High Performance Computing, Networking …, 2020 | 1325 | 2020 |
Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters J Rasley, S Rajbhandari, O Ruwase, Y He Proceedings of the 26th ACM SIGKDD International Conference on Knowledge …, 2020 | 1234 | 2020 |
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022 | 686 | 2022 |
{Zero-offload}: Democratizing {billion-scale} model training J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ... 2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021 | 405 | 2021 |
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He Proceedings of the international conference for high performance computing …, 2021 | 348 | 2021 |
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 307 | 2022 |
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022 | 256 | 2022 |
Learning intrinsic sparse structures within long short-term memory W Wen, Y He, S Rajbhandari, M Zhang, W Wang, F Liu, B Hu, Y Chen, ... arXiv preprint arXiv:1709.05027, 2017 | 159 | 2017 |
Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, and Bryan Catanzaro S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... Using deepspeed and megatron to train megatron-turing nlg 530b, a large …, 2022 | 151 | 2022 |
{DeepCPU}: Serving {RNN-based} Deep Learning Models 10x Faster M Zhang, S Rajbhandari, W Wang, Y He 2018 USENIX Annual Technical Conference (USENIX ATC 18), 951-965, 2018 | 126 | 2018 |
1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ... International Conference on Machine Learning, 10118-10129, 2021 | 96 | 2021 |
Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021 | 82 | 2021 |
Deepspeed ulysses: System optimizations for enabling training of extreme long sequence transformer models SA Jacobs, M Tanaka, C Zhang, M Zhang, SL Song, S Rajbhandari, Y He arXiv preprint arXiv:2309.14509, 2023 | 60 | 2023 |
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023 | 55 | 2023 |
Neural network training performance optimization framework TA Chilimbi, O Ruwase, S Rajbhandari, M Carbin, Y He US Patent App. 14/986,186, 2017 | 53 | 2017 |
Deepspeed-fastgen: High-throughput text generation for llms via mii and deepspeed-inference C Holmes, M Tanaka, M Wyatt, AA Awan, J Rasley, S Rajbhandari, ... arXiv preprint arXiv:2401.08671, 2024 | 42 | 2024 |
Zero++: Extremely efficient collective communication for giant model training G Wang, H Qin, SA Jacobs, C Holmes, S Rajbhandari, O Ruwase, F Yan, ... arXiv preprint arXiv:2306.10209, 2023 | 40 | 2023 |
Optimizing cnns on multicores for scalability, performance and goodput S Rajbhandari, Y He, O Ruwase, M Carbin, T Chilimbi ACM SIGARCH Computer Architecture News 45 (1), 267-280, 2017 | 37 | 2017 |
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed C Li, AA Awan, H Tang, S Rajbhandari, Y He 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022 | 36 | 2022 |