Follow
Adrián Castelló
Title
Cited by
Cited by
Year
Argobots: A lightweight low-level threading and tasking framework
S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ...
IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017
1502017
SLURM support for remote GPU virtualization: Implementation and performance study
S Iserte, A Castelló, R Mayo, ES Quintana-Ortí, F Silla, J Duato, C Reaño, ...
2014 IEEE 26th International Symposium on Computer Architecture and High …, 2014
342014
High Performance and Portable Convolution Operators for Multicore Processors
P San Juan, A Castelló, MF Dolz, P Alonso-Jordá, ES Quintana-Ortí
SBAC-PAD 2020, 2020
25*2020
Improving the User Experience of the rCUDA Remote GPU Virtualization Framework
C Reano, F Silla, A Castelló, AJ Pena, R Mayo, ES Quintana-Ortí, J Duato
252014
PyDTNN: a user-friendly and extensible framework for distributed deep learning
S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre
The Journal of Supercomputing 77, 9971-9987, 2021
202021
A Review of Lightweight Thread Approaches for High Performance Computing
A Castelló, AJ Peña, S Seo, R Mayo, P Balaji, ES Quintana-Ortí
2016 IEEE International Conference on Cluster Computing (CLUSTER 2016), 471-480, 2016
192016
Analysis of model parallelism for distributed neural networks
A Castelló, MF Dolz, ES Quintana-Ortí, J Duato
Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019
142019
On the use of remote GPUs and low-power processors for the acceleration of scientific applications
A Castelló, J Duato, R Mayo, AJ Pena, ES Quintana-Ortí, V Roca, F Silla
The Fourth International Conference on Smart Grids, Green Communications and …, 2014
142014
Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks
A Castelló, MF Dolz, ES Quintana-Ortí, J Duato
2nd High Performance Machine Learning Workshop (HPML 2019), 534-541, 2019
132019
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations
A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña
International Conference on Parallel Processing (ICPP-2017), 60-69, 2017
132017
Enabling GPU Virtualization in Cloud Environments
S Iserte, FJ Clemente-Castelló, A Castelló, R Mayo, ES Quintana-Ortí
CLOSER 2016, 2016
132016
Reformulating the direct convolution for high-performance deep learning inference on ARM processors
S Barrachina, A Castelló, MF Dolz, TM Low, H Martínez, ES Quintana-Ortí, ...
Journal of Systems Architecture 135, 102806, 2023
122023
Anatomy of the BLIS family of algorithms for matrix multiplication
A Castelló, ES Quintana-Ortí, FD Igual
2022 30th Euromicro International Conference on Parallel, Distributed and …, 2022
92022
Accelerating distributed deep neural network training with pipelined MPI allreduce
A Castelló, ES Quintana-Ortí, J Duato
Cluster Computing 24 (4), 3797-3813, 2021
92021
A flexible research-oriented framework for distributed training of deep neural networks
S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre
2021 IEEE International Parallel and Distributed Processing Symposium …, 2021
92021
GLT: A unified API for lightweight thread libraries
A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña
Euro-Par 2017: Parallel Processing: 23rd International Conference on …, 2017
82017
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS
A Castelló, S Barrachina, MF Dolz, ES Quintana-Ortí, P San Juan, ...
Journal of Systems Architecture 125, 102459, 2022
7*2022
Programming parallel dense matrix factorizations with look-ahead and OpenMP
S Catalán, A Castelló, FD Igual, R Rodríguez-Sánchez, ES Quintana-Ortí
Cluster Computing 23, 359-375, 2020
72020
On the adequacy of lightweight thread approaches for high-level parallel programming models
A Castelló, R Mayo, K Sala, V Beltran, P Balaji, AJ Peña
Future Generation Computer Systems 84, 22-31, 2018
72018
Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA
A Castelló, R Mayo, ES Quintana-Ortí, AJ Pena, P Balaji
2015 IEEE International Conference on Cluster Computing (CLUSTER), 92 - 95, 2015
72015
The system can't perform the operation now. Try again later.
Articles 1–20