Follow
Huizhen Yu
Title
Cited by
Cited by
Year
Convergence results for some temporal difference methods based on least squares
H Yu, DP Bertsekas
IEEE Transactions on Automatic Control 54 (7), 1515-1531, 2009
1302009
Projected equation methods for approximate solution of large linear systems
DP Bertsekas, H Yu
Journal of Computational and Applied Mathematics 227 (1), 27-50, 2009
762009
Error bounds for approximations from projected linear equations
H Yu, DP Bertsekas
Mathematics of Operations Research 35 (2), 306-329, 2010
732010
Q-learning and enhanced policy iteration in discounted dynamic programming
DP Bertsekas, H Yu
Mathematics of Operations Research 37 (1), 66-94, 2012
602012
Discretized approximations for POMDP with average cost
H Yu, D Bertsekas
arXiv preprint arXiv:1207.4154, 2012
572012
On convergence of emphatic temporal-difference learning
H Yu
Conference on learning theory, 1724-1751, 2015
542015
A unifying polyhedral approximation framework for convex optimization
DP Bertsekas, H Yu
SIAM Journal on Optimization 21 (1), 333-360, 2011
542011
Q-learning and policy iteration algorithms for stochastic shortest path problems
H Yu, DP Bertsekas
Annals of Operations Research 208 (1), 95-132, 2013
512013
Basis function adaptation methods for cost approximation in MDP
H Yu, DP Bertsekas
2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement …, 2009
502009
Multi-step off-policy learning without importance sampling ratios
AR Mahmood, H Yu, RS Sutton
arXiv preprint arXiv:1702.03006, 2017
492017
On near optimality of the set of finite-state controllers for average cost POMDP
H Yu, DP Bertsekas
Mathematics of Operations Research 33 (1), 1-11, 2008
442008
Q-learning algorithms for optimal stopping based on least squares
H Yu, DP Bertsekas
2007 European Control Conference (ECC), 2368-2375, 2007
432007
Least squares temporal difference methods: An analysis under general conditions
H Yu
SIAM Journal on Control and Optimization 50 (6), 3310-3343, 2012
402012
Approximate solution methods for partially observable Markov and semi-Markov decision processes
H Yu
Massachusetts Institute of Technology, 2006
402006
On generalized bellman equations and temporal-difference learning
H Yu, AR Mahmood, RS Sutton
Journal of Machine Learning Research 19 (48), 1-49, 2018
392018
Convergence of Least Squares Temporal Difference Methods Under General Conditions.
H Yu
ICML, 1207-1214, 2010
382010
Emphatic temporal-difference learning
AR Mahmood, H Yu, M White, RS Sutton
arXiv preprint arXiv:1507.01569, 2015
372015
Stochastic shortest path problems under weak conditions
DP Bertsekas, H Yu
Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, 2013
362013
On boundedness of Q-learning iterates for stochastic shortest path problems
H Yu, DP Bertsekas
Mathematics of Operations Research 38 (2), 209-227, 2013
312013
Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize
H Yu
Journal of Machine Learning Research 17 (219), 1-58, 2016
292016
The system can't perform the operation now. Try again later.
Articles 1–20