Ammar Ahmad Awan

Cited by

	All	Since 2019
Citations	1621	1446
h-index	21	19
i10-index	28	25

460

230

115

345

2013201420152016201720182019202020212022202320245 13 19 20 26 84 105 135 189 195 366 453

Public access

View all

18 articles

5 articles

available

not available

Based on funding mandates

Co-authors

Dhabaleswar K. PandaProfessor of Computer Science, The Ohio State UniversityVerified email at cse.ohio-state.edu
Hari SubramoniThe Ohio State UniversityVerified email at cse.ohio-state.edu
He YuxiongMicrosoft ResearchVerified email at microsoft.com
Ching-Hsiang ChuResearch Scientist, Meta/FacebookVerified email at meta.com
Khaled HamidoucheAMD ResearchVerified email at amd.com
Jeff RasleyMicrosoftVerified email at microsoft.com
Reza Yazdani AminabadiMicrosoft ResearchVerified email at microsoft.com
Minjia ZhangUniversity of Illinois at Urbana-ChampaginVerified email at illinois.edu
Arpan JainThe Ohio State UniversityVerified email at osu.edu
Olatunji RuwaseMicrosoft ResearchVerified email at microsoft.com
Conglong LiSenior Researcher at Microsoft, CMU Ph.D.Verified email at microsoft.com
Akshay VenkateshNVIDIA; Ohio State UniversityVerified email at nvidia.com
Quentin AnthonyPhD Student, Ohio State UniversityVerified email at osu.edu
Jahanzeb HashmiSenior Architect, NVIDIAVerified email at nvidia.com
Zhewei YaoSnowflakeVerified email at snowflake.com
Xiaoyi LuAssociate Professor, University of California, MercedVerified email at ucmerced.edu
Kawthar Shafie KhorassaniAMDVerified email at amd.com
(Altamont) Bracy Hamilton EltonPenguin ComputingVerified email at bracyelton.com
Raghu MachirajuProfessor of Computer Science and Engineering, Bioinformatics and PathologyVerified email at osu.edu
Anil ParwaniProfessor of Pathology and Biomedical InformaticsVerified email at osumc.edu

Ammar Ahmad Awan

Microsoft

Verified email at osu.edu - Homepage

Deep Learning HPC Parallel I/O MPI Cloud Computing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	189	2022
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters AA Awan, K Hamidouche, JM Hashmi, DK Panda ACM PPoPP '17 52 (8), 193-205, 2017	178	2017
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022	175	2022
Phi-3 technical report: A highly capable language model locally on your phone M Abdin, SA Jacobs, AA Awan, J Aneja, A Awadallah, H Awadalla, ... arXiv preprint arXiv:2404.14219, 2024	121	2024
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures AA Awan, H Subramoni, DK Panda Proceedings of the Machine Learning on HPC Environments, 1-8, 2017	82	2017
1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ... International Conference on Machine Learning, 10118-10129, 2021	75	2021
Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021	66	2021
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019	57	2019
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning AA Awan, K Hamidouche, A Venkatesh, DK Panda Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016	56	2016
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL? AA Awan, CH Chu, H Subramoni, DK Panda Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018	54	2018
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training A Jain, AA Awan, AM Aljuhani, JM Hashmi, QG Anthony, H Subramoni, ... SC20: International Conference for High Performance Computing, Networking …, 2020	49	2020
Privacy-aware searching with oblivious term matching for cloud storage Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh The Journal of Supercomputing 63, 538-560, 2013	47	2013
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020	42	2020
Performance characterization of dnn training using tensorflow and pytorch on modern clusters A Jain, AA Awan, Q Anthony, H Subramoni, DKDK Panda 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-11, 2019	41	2019
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	39	2023
Oc-dnn: Exploiting advanced unified memory capabilities in cuda 9 and volta gpus for out-of-core dnn training AA Awan, CH Chu, H Subramoni, X Lu, DK Panda 2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018	37	2018
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed C Li, AA Awan, H Tang, S Rajbhandari, Y He 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022	29	2022
Scaling tensorflow, pytorch, and mxnet using mvapich2 for high-performance deep learning on frontera A Jain, AA Awan, H Subramoni, DK Panda 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS), 76-83, 2019	28	2019
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ... 2015 IEEE International Conference on Cluster Computing, 78-87, 2015	28	2015
Cuda kernel based collective reduction operations on large-scale gpu clusters CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016	27	2016

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors