Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022 | 645 | 2022 |
{Zero-offload}: Democratizing {billion-scale} model training J Ren, S Rajbhandari, RY Aminabadi, O Ruwase, S Yang, M Zhang, D Li, ... 2021 USENIX Annual Technical Conference (USENIX ATC 21), 551-564, 2021 | 367 | 2021 |
Zeroquant: Efficient and affordable post-training quantization for large-scale transformers Z Yao, R Yazdani Aminabadi, M Zhang, X Wu, C Li, Y He Advances in Neural Information Processing Systems 35, 27168-27183, 2022 | 334 | 2022 |
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 274 | 2022 |
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022 | 228 | 2022 |
An ultra low-power hardware accelerator for automatic speech recognition R Yazdani, A Segura, JM Arnau, A Gonzalez Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, 2016 | 54 | 2016 |
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023 | 53 | 2023 |
The Dark Side of DNN Pruning R Yazdani, M Riera, JM Arnau, A Gonzalez The 45th International Symposium on Computer Architecture - ISCA 2018 1 (1), 2018 | 37 | 2018 |
Multi-objective interior design optimization method based on sustainability concepts for post-disaster temporary housing units SMA Hosseini, R Yazdani, A de la Fuente Building and environment 173, 106742, 2020 | 33 | 2020 |
Deepspeed-fastgen: High-throughput text generation for llms via mii and deepspeed-inference C Holmes, M Tanaka, M Wyatt, AA Awan, J Rasley, S Rajbhandari, ... arXiv preprint arXiv:2401.08671, 2024 | 28 | 2024 |
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He arXiv preprint arXiv:2301.12017, 2023 | 23 | 2023 |
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. arXiv S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... Preprint published online January 28, 2022 | 19 | 2022 |
Low-Power Automatic Speech Recognition Through a Mobile GPU and a Viterbi Accelerator R Yazdani, A Segura, JM Arnau, A Gonzalez IEEE Micro 37 (01), 22-29, 2017 | 19 | 2017 |
LSTM-sharp: An adaptable, energy-efficient hardware accelerator for long short-term memory R Yazdani, O Ruwase, M Zhang, Y He, JM Arnau, A González arXiv preprint arXiv:1911.01258, 2019 | 18 | 2019 |
Understanding int4 quantization for language models: latency speedup, composability, and failure cases X Wu, C Li, RY Aminabadi, Z Yao, Y He International Conference on Machine Learning, 37524-37539, 2023 | 16 | 2023 |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... A large-scale generative language model, 2022 | 16 | 2022 |
A low-power, high-performance speech recognition accelerator R Yazdani, JM Arnau, A González IEEE Transactions on Computers 68 (12), 1817-1831, 2019 | 15 | 2019 |
UNFOLD: A Memory-Efficient Speech Recognizer Using On-The-Fly WFST Composition R Yazdani, JM Arnau, A Gonzalez IEEE/ACM International Symposium on Microarchitecture (MICRO'50), 2017 | 15 | 2017 |
Fault-Tolerant 3-D Network-on-Chip Design using Dynamic Link Sharing SHS Rezaei, M Modarressi, R Yazdani, M Daneshtalab Design, Automation & Test in Europe Conference & Exhibition (DATE) 1, 1195-1200, 2016 | 10 | 2016 |
Zeroquant (4+ 2): Redefining llms quantization with a new fp6-centric strategy for diverse generative tasks X Wu, H Xia, S Youn, Z Zheng, S Chen, A Bakhtiari, M Wyatt, ... arXiv preprint arXiv:2312.08583, 2023 | 8 | 2023 |