Samyam Rajbhandari

Cited by

	All	Since 2019
Citations	5538	5423
h-index	21	19
i10-index	26	23

2300

1150

575

1725

2014201520162017201820192020202120222023202414 19 9 22 33 78 129 302 722 2233 1946

Public access

View all

5 articles

3 articles

available

not available

Based on funding mandates

Co-authors

He YuxiongMicrosoft ResearchVerified email at microsoft.com
Minjia ZhangUniversity of Illinois at Urbana-ChampaginVerified email at illinois.edu
P SadayappanProfessor, Kahlert School of Computing, University of UtahVerified email at cs.utah.edu
Sriram KrishnamoorthyGoogleVerified email at google.com
Pai-Wei LaiMetaVerified email at meta.com
Michael CarbinMassachusetts Institute of TechnologyVerified email at csail.mit.edu
Wei WenResearch Scientist, AI at MetaVerified email at fb.com
Robert J. HarrisonStony Brook UniversityVerified email at stonybrook.edu
Louis-Noel PouchetColorado State UniversityVerified email at colostate.edu
Jinsung KimChung-Ang UniversityVerified email at cau.ac.kr
Karol KowalskiPacific Northwest National LaboratoryVerified email at pnnl.gov
Edward ValeevVirginia TechVerified email at vt.edu

Samyam Rajbhandari

Microsoft Artificial Intelligence and Research, Ohio State University

No verified email - Homepage

Deep Learning High Performance Computing Systems


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...	1350	2023
ZeRO: Memory optimizations toward training trillion parameter models S Rajbhandari, J Rasley, O Ruwase, Y He SC20: International Conference for High Performance Computing, Networking …, 2020	939	2020
Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters J Rasley, S Rajbhandari, O Ruwase, Y He Proceedings of the 26th ACM SIGKDD International Conference on Knowledge …, 2020	857	2020
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022	556	2022
{Zero-offload}: Democratizing {billion-scale} model training J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ... 2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021	292	2021
Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning S Rajbhandari, O Ruwase, J Rasley, S Smith, Y He Proceedings of the international conference for high performance computing …, 2021	256	2021
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	189	2022
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022	175	2022
Learning intrinsic sparse structures within long short-term memory W Wen, Y He, S Rajbhandari, M Zhang, W Wang, F Liu, B Hu, Y Chen, ... arXiv preprint arXiv:1709.05027, 2017	151	2017
Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, and Bryan Catanzaro S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... Using deepspeed and megatron to train megatron-turing nlg 530b, a large …, 2022	126	2022
{DeepCPU}: Serving {RNN-based} Deep Learning Models 10x Faster M Zhang, S Rajbhandari, W Wang, Y He 2018 USENIX Annual Technical Conference (USENIX ATC 18), 951-965, 2018	116	2018
1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ... International Conference on Machine Learning, 10118-10129, 2021	75	2021
Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021	66	2021
Neural network training performance optimization framework TA Chilimbi, O Ruwase, S Rajbhandari, M Carbin, Y He US Patent App. 14/986,186, 2017	44	2017
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	39	2023
A communication-optimal framework for contracting distributed tensors S Rajbhandari, A Nikam, PW Lai, K Stock, S Krishnamoorthy, ... SC'14: Proceedings of the International Conference for High Performance …, 2014	35	2014
Optimizing cnns on multicores for scalability, performance and goodput S Rajbhandari, Y He, O Ruwase, M Carbin, T Chilimbi ACM SIGARCH Computer Architecture News 45 (1), 267-280, 2017	30	2017
A framework for load balancing of tensor contraction expressions via dynamic task partitioning PW Lai, K Stock, S Rajbhandari, S Krishnamoorthy, P Sadayappan Proceedings of the International Conference on High Performance Computing …, 2013	30	2013
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed C Li, AA Awan, H Tang, S Rajbhandari, Y He 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022	29	2022
On fusing recursive traversals of Kd trees S Rajbhandari, J Kim, S Krishnamoorthy, LN Pouchet, F Rastello, ... Proceedings of the 25th International Conference on Compiler Construction …, 2016	29	2016

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors