dc.identifier.uri	http://hdl.handle.net/11401/77166
dc.description.sponsorship	This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.	en_US
dc.format	Monograph
dc.format.medium	Electronic Resource	en_US
dc.language.iso	en_US
dc.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dc.type	Dissertation
dcterms.abstract	Recently there has been a resurgence of interest in the complexity of algorithms for Markov decision processes (MDPs) and stochastic games. Much of this work was inspired by recent groundbreaking results on the complexity of policy iteration algorithms for MDPs under the Blum-Shub-Smale (BSS) model of computation. In particular, for discounted MDPs with a fixed discount factor, Yinyu Ye showed that the number of arithmetic operations needed by two classic variants of policy iteration can be bounded above by a polynomial in the number of state-action pairs only. A natural question is whether a similar complexity estimate exists for the value iteration algorithm, which is another classic approach to computing optimal policies for MDPs. Our first main contribution is a negative answer to this question. Using a deterministic MDP with four state-action pairs, we show that under the BSS model there is no upper bound on the number of iterations needed by value iteration to return an optimal policy. We also show that the same example implies the same result for a broad class of so-called optimistic policy iteration algorithms, which includes algorithms of interest to the reinforcement learning community such as Î»-policy iteration and modified policy iteration. Another natural question is whether Ye's approach can yield results for MDPs under other optimality criteria. Our second main contribution is a formulation of conditions, which to our knowledge are the most general ones known, under which MDPs and two-player zero-sum stochastic games with Borel state and action spaces can be reduced to discounted ones. For undiscounted total-reward MDPs and stochastic games, the transformations we formulate are based on an idea due to Alan Hoffman. For average-rewards, the transformations are extensions to Borel state and action spaces of one proposed recently for finite stochastic games. In addition to implying the existence of Îµ-optimal policies for total and average rewards, these reductions lead to estimates of the number of arithmetic operations needed to compute optimal policies for such models with finite state and actions sets, as well as complexity estimates for computing Îµ-optimal policies for MDPs with Euclidean state and action spaces.
dcterms.abstract	Recently there has been a resurgence of interest in the complexity of algorithms for Markov decision processes (MDPs) and stochastic games. Much of this work was inspired by recent groundbreaking results on the complexity of policy iteration algorithms for MDPs under the Blum-Shub-Smale (BSS) model of computation. In particular, for discounted MDPs with a fixed discount factor, Yinyu Ye showed that the number of arithmetic operations needed by two classic variants of policy iteration can be bounded above by a polynomial in the number of state-action pairs only. A natural question is whether a similar complexity estimate exists for the value iteration algorithm, which is another classic approach to computing optimal policies for MDPs. Our first main contribution is a negative answer to this question. Using a deterministic MDP with four state-action pairs, we show that under the BSS model there is no upper bound on the number of iterations needed by value iteration to return an optimal policy. We also show that the same example implies the same result for a broad class of so-called optimistic policy iteration algorithms, which includes algorithms of interest to the reinforcement learning community such as λ-policy iteration and modified policy iteration. Another natural question is whether Ye's approach can yield results for MDPs under other optimality criteria. Our second main contribution is a formulation of conditions, which to our knowledge are the most general ones known, under which MDPs and two-player zero-sum stochastic games with Borel state and action spaces can be reduced to discounted ones. For undiscounted total-reward MDPs and stochastic games, the transformations we formulate are based on an idea due to Alan Hoffman. For average-rewards, the transformations are extensions to Borel state and action spaces of one proposed recently for finite stochastic games. In addition to implying the existence of ε-optimal policies for total and average rewards, these reductions lead to estimates of the number of arithmetic operations needed to compute optimal policies for such models with finite state and actions sets, as well as complexity estimates for computing ε-optimal policies for MDPs with Euclidean state and action spaces.
dcterms.available	2017-09-20T16:52:08Z
dcterms.contributor	Mitchell, Joseph S. B.	en_US
dcterms.contributor	Feinberg, Eugene A.	en_US
dcterms.contributor	Hu, Jiaqiao	en_US
dcterms.contributor	Gao, Jie.	en_US
dcterms.creator	Huang, Jefferson
dcterms.dateAccepted	2017-09-20T16:52:08Z
dcterms.dateSubmitted	2017-09-20T16:52:08Z
dcterms.description	Department of Applied Mathematics and Statistics	en_US
dcterms.extent	125 pg.	en_US
dcterms.format	Application/PDF	en_US
dcterms.format	Monograph
dcterms.identifier	http://hdl.handle.net/11401/77166
dcterms.issued	2016-12-01
dcterms.language	en_US
dcterms.provenance	Made available in DSpace on 2017-09-20T16:52:08Z (GMT). No. of bitstreams: 1 Huang_grad.sunysb_0771E_13001.pdf: 790262 bytes, checksum: a03a3259262f99b742f09c2bf2344f16 (MD5) Previous issue date: 1	en
dcterms.publisher	The Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subject	algorithm, complexity, control, Markov decision process, policy, stochastic game
dcterms.subject	Operations research -- Mathematics
dcterms.title	Complexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games
dcterms.type	Dissertation

Files in this item

Name:: Huang_grad.sunysb_0771E_13001.pdf
Size:: 771.7Kb
Format:: application/pdf

View/Open

This item appears in the following Collection(s)

Stony Brook Theses and Dissertations Collection [4009]

Show simple item record

Complexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games

Files in this item

This item appears in the following Collection(s)