Randomness in neural network training: Characterizing the impact of tooling D Zhuang, X Zhang, S Song, S Hooker Proceedings of Machine Learning and Systems 4, 316-336, 2022 | 85 | 2022 |
Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu, Y Li, W Lin, SL Song arXiv preprint arXiv:2309.10285, 2023 | 39 | 2023 |
η-lstm: Co-designing highly-efficient large lstm training via exploiting memory-saving and architectural design opportunities X Zhang, H Xia, D Zhuang, H Sun, X Fu, MB Taylor, SL Song 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 18 | 2021 |
Clicktrain: Efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning C Zhang, G Yuan, W Niu, J Tian, S Jin, D Zhuang, Z Jiang, Y Wang, B Ren, ... Proceedings of the ACM international conference on supercomputing, 266-278, 2021 | 18 | 2021 |
Fp6-llm: Efficiently serving large language models through fp6-centric algorithm-system co-design H Xia, Z Zheng, X Wu, S Chen, Z Yao, S Youn, A Bakhtiari, M Wyatt, ... arXiv preprint arXiv:2401.14112, 2024 | 11 | 2024 |
Enabling highly efficient capsule networks processing through software-hardware co-design X Zhang, X Fu, D Zhuang, C Xie, SL Song IEEE Transactions on Computers 70 (4), 495-510, 2021 | 8 | 2021 |
{Quant-LLM}: Accelerating the Serving of Large Language Models via {FP6-Centric}{Algorithm-System}{Co-Design} on Modern {GPUs} H Xia, Z Zheng, X Wu, S Chen, Z Yao, S Youn, A Bakhtiari, M Wyatt, ... 2024 USENIX Annual Technical Conference (USENIX ATC 24), 699-713, 2024 | 3 | 2024 |
Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems H Zhang, L Li, H Liu, D Zhuang, R Liu, C Huan, S Song, D Tao, Y Liu, ... Proceedings of the 36th ACM International Conference on Supercomputing, 1-14, 2022 | 2 | 2022 |
An efficient uncertain graph processing framework for heterogeneous architectures H Zhang, L Li, D Zhuang, R Liu, S Song, D Tao, Y Wu, SL Song Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021 | 2 | 2021 |
MonoNN: Enabling a new monolithic optimization space for neural network inference tasks on modern GPU-Centric architectures D Zhuang, Z Zheng, H Xia, X Qiu, J Bai, W Lin, SL Song 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 1 | 2024 |
{MonoNN}: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern {GPU-Centric} Architectures D Zhuang, Z Zheng, H Xia, X Qiu, J Bai, W Lin, SL Song 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | | 2024 |
DynamAP: Architectural Support for Dynamic Graph Traversal on the Automata Processor Y Liu, X Zhang, D Zhuang, X Fu, S Song ACM Transactions on Architecture and Code Optimization (TACO) 19 (4), 1-26, 2022 | | 2022 |
Flash-LLM: Enabling Cost-E ective and Highly-E icient Large Generative Model Inference with Unstructured Sparsity H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu, Y Li, W Lin, SL Song | | |