Follow
Samyam Rajbhandari
Samyam Rajbhandari
Microsoft Artificial Intelligence and Research, Ohio State University
No verified email - Homepage
Title
Cited by
Cited by
Year
Bloom: A 176b-parameter open-access multilingual language model
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...
12642023
ZeRO: Memory optimizations toward training trillion parameter models
S Rajbhandari, J Rasley, O Ruwase, Y He
SC20: International Conference for High Performance Computing, Networking …, 2020
8502020
Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters
J Rasley, S Rajbhandari, O Ruwase, Y He
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge …, 2020
7552020
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model
S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ...
arXiv preprint arXiv:2201.11990, 2022
5232022
{Zero-offload}: Democratizing {billion-scale} model training
J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ...
2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021
2632021
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning
S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He
Proceedings of the international conference for high performance computing …, 2021
2342021
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale
S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ...
International conference on machine learning, 18332-18346, 2022
1582022
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale
RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ...
SC22: International Conference for High Performance Computing, Networking …, 2022
1532022
Learning intrinsic sparse structures within long short-term memory
W Wen, Y He, S Rajbhandari, M Zhang, W Wang, F Liu, B Hu, Y Chen, ...
arXiv preprint arXiv:1709.05027, 2017
1512017
Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, and Bryan Catanzaro
S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ...
Using deepspeed and megatron to train megatron-turing nlg 530b, a large …, 2022
1232022
{DeepCPU}: Serving {RNN-based} Deep Learning Models 10x Faster
M Zhang, S Rajbhandari, W Wang, Y He
2018 USENIX Annual Technical Conference (USENIX ATC 18), 951-965, 2018
1152018
1-bit adam: Communication efficient large-scale training with adam’s convergence speed
H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ...
International Conference on Machine Learning, 10118-10129, 2021
712021
Scalable and efficient moe training for multitask multilingual models
YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ...
arXiv preprint arXiv:2109.10465, 2021
642021
Neural network training performance optimization framework
TA Chilimbi, O Ruwase, S Rajbhandari, M Carbin, Y He
US Patent App. 14/986,186, 2017
422017
A communication-optimal framework for contracting distributed tensors
S Rajbhandari, A Nikam, PW Lai, K Stock, S Krishnamoorthy, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
362014
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales
Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ...
arXiv preprint arXiv:2308.01320, 2023
332023
Optimizing CNNs on multicores for scalability, performance and goodput
S Rajbhandari, Y He, O Ruwase, M Carbin, T Chilimbi
ACM SIGARCH Computer Architecture News 45 (1), 267-280, 2017
302017
A framework for load balancing of tensor contraction expressions via dynamic task partitioning
PW Lai, K Stock, S Rajbhandari, S Krishnamoorthy, P Sadayappan
Proceedings of the International Conference on High Performance Computing …, 2013
302013
On fusing recursive traversals of Kd trees
S Rajbhandari, J Kim, S Krishnamoorthy, LN Pouchet, F Rastello, ...
Proceedings of the 25th International Conference on Compiler Construction …, 2016
282016
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed
C Li, AA Awan, H Tang, S Rajbhandari, Y He
2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022
262022
The system can't perform the operation now. Try again later.
Articles 1–20