Adrián Castelló

Cited by

	All	Since 2019
Citations	522	388
h-index	12	10
i10-index	16	11

120

201520162017201820192020202120222023202419 22 32 54 40 54 61 56 103 74

Public access

View all

38 articles

7 articles

available

not available

Based on funding mandates

Co-authors

Enrique S. Quintana-OrtíUniversitat Politècnica de València, SpainVerified email at disca.upv.es
Manuel F. DolzUniversitat Jaume IVerified email at icc.uji.es
Jose DuatoUniversitat Politècnica de ValènciaVerified email at disca.upv.es
Antonio J. PeñaBarcelona Supercomputing Center (BSC)Verified email at bsc.es
Pavan BalajiArgonne National LaboratoryVerified email at anl.gov
Sangmin SeoKlaytn FoundationVerified email at klaytn.foundation
Pedro Alonso-JordáUniversitat Politècnica de ValènciaVerified email at upv.es
Francisco D. IgualUniversidad Complutense de MadridVerified email at ucm.es
Sergio IserteSenior Researcher @ BSCVerified email at bsc.es
Sandra CatalánUniversitat Jaume IVerified email at uji.es

Adrián Castelló

Postdoc Fellow @ Universitat Politècnica de València (UPV)

Verified email at disca.upv.es - Homepage

Code Auto-generation Programming Models High Performance Computing Lightweight threading Deep Neural Networks


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Argobots: A lightweight low-level threading and tasking framework S Seo, A Amer, P Balaji, C Bordage, G Bosilca, A Brooks, P Carns, ... IEEE Transactions on Parallel and Distributed Systems 29 (3), 512-526, 2017	157	2017
SLURM support for remote GPU virtualization: Implementation and performance study S Iserte, A Castelló, R Mayo, ES Quintana-Ortí, F Silla, J Duato, C Reano, ... 2014 IEEE 26th International Symposium on Computer Architecture and High …, 2014	34	2014
High Performance and Portable Convolution Operators for Multicore Processors P San Juan, A Castelló, MF Dolz, P Alonso-Jordá, ES Quintana-Ortí SBAC-PAD 2020, 2020	28*	2020
Improving the User Experience of the rCUDA Remote GPU Virtualization Framework C Reano, F Silla, A Castelló, AJ Pena, R Mayo, ES Quintana-Ortí, J Duato	24	2014
PyDTNN: a user-friendly and extensible framework for distributed deep learning S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre The Journal of Supercomputing 77, 9971-9987, 2021	21	2021
Reformulating the direct convolution for high-performance deep learning inference on ARM processors S Barrachina, A Castelló, MF Dolz, TM Low, H Martínez, ES Quintana-Ortí, ... Journal of Systems Architecture 135, 102806, 2023	20	2023
A Review of Lightweight Thread Approaches for High Performance Computing A Castelló, AJ Peña, S Seo, R Mayo, P Balaji, ES Quintana-Ortí 2016 IEEE International Conference on Cluster Computing (CLUSTER 2016), 471-480, 2016	19	2016
Analysis of model parallelism for distributed neural networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato Proceedings of the 26th European MPI Users' Group Meeting, 1-10, 2019	18	2019
On the use of remote GPUs and low-power processors for the acceleration of scientific applications A Castelló, J Duato, R Mayo, AJ Pena, ES Quintana-Ortí, V Roca, F Silla The Fourth International Conference on Smart Grids, Green Communications and …, 2014	15	2014
Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks A Castelló, MF Dolz, ES Quintana-Ortí, J Duato 2nd High Performance Machine Learning Workshop (HPML 2019), 534-541, 2019	14	2019
GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña International Conference on Parallel Processing (ICPP-2017), 60-69, 2017	13	2017
Enabling GPU Virtualization in Cloud Environments S Iserte, FJ Clemente-Castelló, A Castelló, R Mayo, ES Quintana-Ortí CLOSER 2016, 2016	13	2016
Anatomy of the BLIS family of algorithms for matrix multiplication A Castelló, ES Quintana-Ortí, FD Igual 2022 30th Euromicro International Conference on Parallel, Distributed and …, 2022	12	2022
Micro-kernels for portable and efficient matrix multiplication in deep learning G Alaejos, A Castelló, H Martínez, P Alonso-Jordá, FD Igual, ... The Journal of Supercomputing 79 (7), 8124-8147, 2023	10	2023
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS A Castelló, S Barrachina, MF Dolz, ES Quintana-Ortí, P San Juan, ... Journal of Systems Architecture 125, 102459, 2022	10	2022
Accelerating distributed deep neural network training with pipelined MPI allreduce A Castelló, ES Quintana-Ortí, J Duato Cluster Computing 24 (4), 3797-3813, 2021	10	2021
A flexible research-oriented framework for distributed training of deep neural networks S Barrachina, A Castelló, M Catalán, MF Dolz, JI Mestre 2021 IEEE International Parallel and Distributed Processing Symposium …, 2021	9	2021
A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor C Ramírez, A Castelló, ES Quintana-Orti The Journal of Supercomputing 78 (16), 18051-18060, 2022	8	2022
GLT: A unified API for lightweight thread libraries A Castelló, S Seo, R Mayo, P Balaji, ES Quintana-Ortí, AJ Peña Euro-Par 2017: Parallel Processing: 23rd International Conference on …, 2017	8	2017
Programming parallel dense matrix factorizations with look-ahead and OpenMP S Catalán, A Castelló, FD Igual, R Rodríguez-Sánchez, ES Quintana-Ortí Cluster Computing 23, 359-375, 2020	7	2020

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors