Huizhen Yu

Cited by

	All	Since 2019
Citations	1238	597
h-index	22	17
i10-index	29	21

140

105

2008200920102011201220132014201520162017201820192020202120222023202413 36 39 54 57 61 56 46 82 66 106 112 105 138 130 93 19

Huizhen Yu

University of Alberta

Verified email at ualberta.ca

stochastic control reinforcement learning stochastic approximation


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Convergence results for some temporal difference methods based on least squares H Yu, DP Bertsekas IEEE Transactions on Automatic Control 54 (7), 1515-1531, 2009	130	2009
Projected equation methods for approximate solution of large linear systems DP Bertsekas, H Yu Journal of Computational and Applied Mathematics 227 (1), 27-50, 2009	76	2009
Error bounds for approximations from projected linear equations H Yu, DP Bertsekas Mathematics of Operations Research 35 (2), 306-329, 2010	73	2010
Q-learning and enhanced policy iteration in discounted dynamic programming DP Bertsekas, H Yu Mathematics of Operations Research 37 (1), 66-94, 2012	60	2012
Discretized approximations for POMDP with average cost H Yu, D Bertsekas arXiv preprint arXiv:1207.4154, 2012	57	2012
On convergence of emphatic temporal-difference learning H Yu Conference on learning theory, 1724-1751, 2015	54	2015
A unifying polyhedral approximation framework for convex optimization DP Bertsekas, H Yu SIAM Journal on Optimization 21 (1), 333-360, 2011	54	2011
Q-learning and policy iteration algorithms for stochastic shortest path problems H Yu, DP Bertsekas Annals of Operations Research 208 (1), 95-132, 2013	51	2013
Basis function adaptation methods for cost approximation in MDP H Yu, DP Bertsekas 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement …, 2009	50	2009
Multi-step off-policy learning without importance sampling ratios AR Mahmood, H Yu, RS Sutton arXiv preprint arXiv:1702.03006, 2017	49	2017
On near optimality of the set of finite-state controllers for average cost POMDP H Yu, DP Bertsekas Mathematics of Operations Research 33 (1), 1-11, 2008	44	2008
Q-learning algorithms for optimal stopping based on least squares H Yu, DP Bertsekas 2007 European Control Conference (ECC), 2368-2375, 2007	43	2007
Least squares temporal difference methods: An analysis under general conditions H Yu SIAM Journal on Control and Optimization 50 (6), 3310-3343, 2012	40	2012
Approximate solution methods for partially observable Markov and semi-Markov decision processes H Yu Massachusetts Institute of Technology, 2006	40	2006
On generalized bellman equations and temporal-difference learning H Yu, AR Mahmood, RS Sutton Journal of Machine Learning Research 19 (48), 1-49, 2018	39	2018
Convergence of Least Squares Temporal Difference Methods Under General Conditions. H Yu ICML, 1207-1214, 2010	38	2010
Emphatic temporal-difference learning AR Mahmood, H Yu, M White, RS Sutton arXiv preprint arXiv:1507.01569, 2015	37	2015
Stochastic shortest path problems under weak conditions DP Bertsekas, H Yu Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, 2013	36	2013
On boundedness of Q-learning iterates for stochastic shortest path problems H Yu, DP Bertsekas Mathematics of Operations Research 38 (2), 209-227, 2013	31	2013
Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize H Yu Journal of Machine Learning Research 17 (219), 1-58, 2016	29	2016

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by