Follow
Mikhail Smelyanskiy
Mikhail Smelyanskiy
Facebook
Verified email at intel.com - Homepage
Title
Cited by
Cited by
Year
On large-batch training for deep learning: Generalization gap and sharp minima
NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang
arXiv preprint arXiv:1609.04836, 2016
37232016
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
VW Lee, C Kim, J Chhugani, M Deisher, D Kim, AD Nguyen, N Satish, ...
Proceedings of the 37th annual international symposium on Computer …, 2010
12222010
Deep learning recommendation model for personalization and recommendation systems
M Naumov, D Mudigere, HJM Shi, J Huang, N Sundaraman, J Park, ...
arXiv preprint arXiv:1906.00091, 2019
7992019
Applied machine learning at facebook: A datacenter infrastructure perspective
K Hazelwood, S Bird, D Brooks, S Chintala, U Diril, D Dzhulgakov, ...
2018 IEEE international symposium on high performance computer architecture …, 2018
7682018
A study of BFLOAT16 for deep learning training
D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ...
arXiv preprint arXiv:1905.12322, 2019
3672019
Efficient sparse matrix-vector multiplication on x86-based many-core processors
X Liu, M Smelyanskiy, E Chow, P Dubey
Proceedings of the 27th international ACM conference on International …, 2013
3432013
Glow: Graph lowering compiler techniques for neural networks
N Rotem, J Fix, S Abdulrasool, G Catron, S Deng, R Dzhabarov, N Gibson, ...
arXiv preprint arXiv:1805.00907, 2018
3392018
The architectural implications of facebook's dnn-based personalized recommendation
U Gupta, CJ Wu, X Wang, M Naumov, B Reagen, D Brooks, B Cottel, ...
2020 IEEE International Symposium on High Performance Computer Architecture …, 2020
3322020
Recnmp: Accelerating personalized recommendation with near-memory processing
L Ke, U Gupta, BY Cho, D Brooks, V Chandra, U Diril, A Firoozshahian, ...
2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020
2422020
Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications
J Park, M Naumov, P Basu, S Deng, A Kalaiah, D Khudia, J Law, P Malani, ...
arXiv preprint arXiv:1811.09886, 2018
2272018
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor
A Heinecke, K Vaidyanathan, M Smelyanskiy, A Kobotov, R Dubtsov, ...
2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013
2192013
Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors
SJ Pennycook, CJ Hughes, M Smelyanskiy, SA Jarvis
2013 IEEE 27th International symposium on parallel and distributed …, 2013
2162013
qHiPSTER: The quantum high performance software testing environment
M Smelyanskiy, NPD Sawaya, A Aspuru-Guzik
arXiv preprint arXiv:1601.07195, 2016
1872016
Practical optimization for hybrid quantum-classical algorithms
GG Guerreschi, M Smelyanskiy
arXiv preprint arXiv:1701.01450, 2017
1782017
Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers
A Heinecke, A Breuer, S Rettenberger, M Bader, AA Gabriel, C Pelties, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
1782014
Anatomy of high-performance many-threaded matrix multiplication
TM Smith, R Van De Geijn, M Smelyanskiy, JR Hammond, FG Van Zee
2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014
1742014
Can traditional programming bridge the ninja performance gap for parallel computing applications?
N Satish, C Kim, J Chhugani, H Saito, R Krishnaiyer, M Smelyanskiy, ...
ACM SIGARCH Computer Architecture News 40 (3), 440-451, 2012
1492012
Convergence of recognition, mining, and synthesis workloads and its implications
YK Chen, J Chhugani, P Dubey, CJ Hughes, D Kim, S Kumar, VW Lee, ...
Proceedings of the IEEE 96 (5), 790-807, 2008
1492008
The BLIS framework: Experiments in portability
FG Van Zee, TM Smith, B Marker, TM Low, RAVD Geijn, FD Igual, ...
ACM Transactions on Mathematical Software (TOMS) 42 (2), 1-19, 2016
1322016
On large-batch training for deep learning: Generalization gap and sharp minima. arXiv 2016
NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang
arXiv preprint arXiv:1609.04836, 2020
1262020
The system can't perform the operation now. Try again later.
Articles 1–20