Ammar Ahmad Awan
TitleCited byYear
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters
AA Awan, K Hamidouche, JM Hashmi, DK Panda
ACM PPoPP '17 52 (8), 193-205, 2017
722017
Privacy-aware searching with oblivious term matching for cloud storage
Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh
The Journal of Supercomputing 63 (2), 538-560, 2013
372013
Optimized broadcast for deep learning workloads on dense-GPU infiniband clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 2, 2018
212018
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning
AA Awan, K Hamidouche, A Venkatesh, DK Panda
Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016
182016
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures
AA Awan, H Subramoni, DK Panda
Proceedings of the Machine Learning on HPC Environments, 8, 2017
172017
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
92015
Intercloud message exchange middleware
MB Amin, WA Khan, AA Awan, S Lee
Proceedings of the 6th International Conference on Ubiquitous Information …, 2012
92012
Efficient and scalable multi-source streaming broadcast on gpu clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
82017
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
82016
Designing non-blocking personalized collectives with near perfect overlap for rdma-enabled clusters
H Subramoni, AA Awan, K Hamidouche, D Pekurovsky, A Venkatesh, ...
International Conference on High Performance Computing, 434-453, 2015
72015
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
Parallel Computing 58, 27-36, 2016
62016
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
52018
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
AA Awan, J Bedorf, CH Chu, H Subramoni, DK Panda
arXiv preprint arXiv:1810.11112, 2018
52018
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast
CH Chu, X Lu, AA Awan, H Subramoni, B Elton, DK Panda
IEEE Transactions on Parallel and Distributed Systems 30 (3), 575-588, 2018
42018
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC
K Hamidouche, AA Awan, A Venkatesh, DK Panda
2016 IEEE 23rd International Conference on High Performance Computing (HiPC …, 2016
22016
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks
AA Awan, K Hamidouche, A Venkatesh, J Perkins, H Subramoni, ...
Proceedings of the 22nd European MPI Users' Group Meeting, 9, 2015
22015
On-demand connection management for OpenSHMEM and OpenSHMEM+ MPI
S Chakraborty, H Subramoni, J Perkins, AA Awan, DK Panda
2015 IEEE International Parallel and Distributed Processing Symposium …, 2015
22015
A case for non-blocking collectives in OpenSHMEM: design, implementation, and performance evaluation using MVAPICH2-X
AA Awan, K Hamidouche, CH Chu, D Panda
Workshop on OpenSHMEM and Related Technologies, 69-86, 2014
22014
Towards Efficient Support for Parallel I/O in Java HPC
AA Awan, MS Ayub, A Shafi, S Lee
2012 13th International Conference on Parallel and Distributed Computing …, 2012
22012
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2?
AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda
Parallel Computing 85, 141-152, 2019
12019
The system can't perform the operation now. Try again later.
Articles 1–20