Esteban Meneses
Esteban Meneses
Instituto Tecnológico de Costa Rica, Centro Nacional de Alta Tecnologia
Verified email at - Homepage
Cited by
Cited by
Argobots: A lightweight low-level threading and tasking framework
S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ...
IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017
ACR: Automatic checkpoint/restart for soft and hard error protection
X Ni, E Meneses, N Jain, LV Kalé
Proceedings of the international conference on high performance computing …, 2013
Periodic hierarchical load balancing for large supercomputers
G Zheng, A Bhatele, E Meneses, LV Kale
The International Journal of High Performance Computing Applications 25 (4 …, 2011
Hierarchical load balancing for charm++ applications on large supercomputers
G Zheng, E Meneses, A Bhatele, LV Kale
2010 39th International Conference on Parallel Processing Workshops, 436-444, 2010
Team-based message logging: Preliminary results
E Meneses, CL Mendes, LV Kalé
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid …, 2010
On the use of cluster-based partial message logging to improve fault tolerance for mpi hpc applications
T Ropars, A Guermouche, B Uçar, E Meneses, LV Kalé, F Cappello
Euro-Par 2011 Parallel Processing: 17th International Conference, Euro-Par …, 2011
Assessing energy efficiency of fault tolerance protocols for HPC systems
E Meneses, O Sarood, LV Kalé
2012 IEEE 24th International Symposium on Computer Architecture and High …, 2012
Hiding checkpoint overhead in HPC applications with a semi-blocking algorithm
X Ni, E Meneses, LV Kalé
2012 IEEE International Conference on Cluster Computing, 364-372, 2012
Using migratable objects to enhance fault tolerance schemes in supercomputers
E Meneses, X Ni, G Zheng, CL Mendes, LV Kale
IEEE transactions on parallel and distributed systems 26 (7), 2061-2074, 2014
A'cool'way of improving the reliability of hpc machines
O Sarood, E Meneses, LV Kale
Proceedings of the International Conference on High Performance Computing …, 2013
Energy profile of rollback-recovery strategies in high performance computing
E Meneses, O Sarood, LV Kalé
Parallel Computing 40 (9), 536-547, 2014
Power, reliability, and performance: One system to rule them all
B Acun, A Langer, E Meneses, H Menon, O Sarood, E Totoni, LV Kalé
Computer 49 (10), 30-37, 2016
Communication and topology-aware load balancing in charm++ with treematch
E Jeannot, E Meneses, G Mercier, F Tessier, G Zheng
2013 IEEE International Conference on Cluster Computing (CLUSTER), 1-8, 2013
Evaluation of simple causal message logging for large-scale fault tolerant HPC systems
E Meneses, G Bronevetsky, LV Kale
2011 IEEE International Symposium on Parallel and Distributed Processing …, 2011
Scalable replay with partial-order dependencies for message-logging fault tolerance
J Lifflander, E Meneses, H Menon, P Miller, S Krishnamoorthy, LV Kalé
2014 IEEE International Conference on Cluster Computing (CLUSTER), 19-28, 2014
A message-logging protocol for multicore systems
E Meneses, X Ni, LV Kalé
IEEE/IFIP International Conference on Dependable Systems and Networks …, 2012
A study of checkpointing in large scale training of deep neural networks
E Rojas, AN Kahira, E Meneses, LB Gomez, RM Badia
arXiv preprint arXiv:2012.00825, 2020
Analyzing the interplay of failures and workload on a leadership-class supercomputer
E Meneses, X Ni, T Jones, D Maxwell
computing 2 (3), 4, 2015
Dynamic load balance for optimized message logging in fault tolerant hpc applications
E Meneses, LV Kalé, G Bronevetsky
2011 IEEE International Conference on Cluster Computing, 281-289, 2011
Analyzing a five-year failure record of a leadership-class supercomputer
E Rojas, E Meneses, T Jones, D Maxwell
2019 31st International Symposium on Computer Architecture and High …, 2019
The system can't perform the operation now. Try again later.
Articles 1–20