On the diversity of cluster workloads and its impact on research results G Amvrosiadis, JW Park, GR Ganger, GA Gibson, E Baseman, ... 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18), 533-546, 2018 | 143 | 2018 |
Lessons learned from memory errors observed over the lifetime of Cielo S Levy, KB Ferreira, N DeBardeleben, T Siddiqua, V Sridharan, ... SC18: International Conference for High Performance Computing, Networking …, 2018 | 36 | 2018 |
Interpretable anomaly detection for monitoring of high performance computing systems E Baseman, S Blanchard, N DeBardeleben, A Bonnie, A Morrow Outlier Definition, Detection, and Description on Demand Workshop at ACM …, 2016 | 35 | 2016 |
Relational synthesis of text and numeric data for anomaly detection on computing system logs E Baseman, S Blanchard, Z Li, S Fu 2016 15th IEEE International Conference on Machine Learning and Applications …, 2016 | 32 | 2016 |
Design, use and evaluation of P-FSEFI: a parallel soft error fault injection framework for emulating soft errors in parallel applications Q Guan, N BeBardeleben, P Wu, S Eidenbenz, S Blanchard, L Monroe, ... Proceedings of the 9th EAI International Conference on Simulation Tools and …, 2016 | 26 | 2016 |
Lifetime memory reliability data from the field T Siddiqua, V Sridharan, SE Raasch, N DeBardeleben, KB Ferreira, ... 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and …, 2017 | 25 | 2017 |
Markov chain modeling for anomaly detection in high performance computing system logs A Haque, A DeLucia, E Baseman Proceedings of the Fourth International Workshop on HPC User Support Tools, 1-8, 2017 | 19 | 2017 |
Improving dram fault characterization through machine learning E Baseman, N DeBardeleben, K Ferreira, S Levy, S Raasch, V Sridharan, ... 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems …, 2016 | 19 | 2016 |
Content+ context networks for user classification in twitter W Campbell, E Baseman, K Greenfield Neural Information Processing Systems (NIPS) 2014 Workshop, available at …, 2013 | 19 | 2013 |
The Atlas cluster trace repository G Amvrosiadis, M Kuchnik, JW Park, C Cranor, GR Ganger, E Moore, ... USENIX login 43 (4), 2018 | 16 | 2018 |
Bigger, longer, fewer: what do cluster jobs look like outside google G Amvrosiadis, JW Park, GR Ganger, GA Gibson, E Baseman, ... Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-PDL-17–104, 2017 | 16 | 2017 |
Physics-Informed Machine Learning for DRAM Error Modeling E Baseman, N DeBardeleben, S Blanchard, J Moore, O Tkachenko, ... | 14* | |
Analysis of system log data using machine learning EA Moore, NA Debardeleben, SP Blanchard US Patent App. 17/061,956, 2021 | 13 | 2021 |
This is why ML-driven cluster scheduling remains widely impractical M Kuchnik, JW Park, C Cranor, E Moore, N DeBardeleben, G Amvrosiadis Tech. rep., 2019 | 13 | 2019 |
Ranking anomalous high performance computing sensor data using unsupervised clustering A Morrow, E Baseman, S Blanchard 2016 International Conference on Computational Science and Computational …, 2016 | 10 | 2016 |
Content+ context= classification: Examining the roles of social interactions and linguist content in twitter user classification W Campbell, E Baseman, K Greenfield Proceedings of the Second Workshop on Natural Language Processing for Social …, 2014 | 7 | 2014 |
Extreme protection against data loss with single-overlap declustered parity H Ke, HS Gunawi, D Bonnie, N DeBardeleben, M Grosskopf, T Grové, ... 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems …, 2020 | 6 | 2020 |
Analyzing HPC Support Tickets: Experience and Recommendations A DeLucia, E Moore arXiv preprint arXiv:2010.04321, 2020 | 5 | 2020 |
Automating dram fault mitigation by learning from experience E Baseman, N Debardeleben, K Ferreira, V Sridharan, T Siddiqua, ... 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems …, 2017 | 5 | 2017 |
Enhancing HPC System Log Analysis by Identifying Message Origin in Source Code M Hickman, D Fulp, E Baseman, S Blanchard, H Greenberg, W Jones, ... 2018 IEEE International Symposium on Software Reliability Engineering …, 2018 | 3 | 2018 |