Show simple item record

dc.identifier.urihttp://hdl.handle.net/11401/77166
dc.description.sponsorshipThis work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree.en_US
dc.formatMonograph
dc.format.mediumElectronic Resourceen_US
dc.language.isoen_US
dc.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dc.typeDissertation
dcterms.abstractRecently there has been a resurgence of interest in the complexity of algorithms for Markov decision processes (MDPs) and stochastic games. Much of this work was inspired by recent groundbreaking results on the complexity of policy iteration algorithms for MDPs under the Blum-Shub-Smale (BSS) model of computation. In particular, for discounted MDPs with a fixed discount factor, Yinyu Ye showed that the number of arithmetic operations needed by two classic variants of policy iteration can be bounded above by a polynomial in the number of state-action pairs only. A natural question is whether a similar complexity estimate exists for the value iteration algorithm, which is another classic approach to computing optimal policies for MDPs. Our first main contribution is a negative answer to this question. Using a deterministic MDP with four state-action pairs, we show that under the BSS model there is no upper bound on the number of iterations needed by value iteration to return an optimal policy. We also show that the same example implies the same result for a broad class of so-called optimistic policy iteration algorithms, which includes algorithms of interest to the reinforcement learning community such as λ-policy iteration and modified policy iteration. Another natural question is whether Ye's approach can yield results for MDPs under other optimality criteria. Our second main contribution is a formulation of conditions, which to our knowledge are the most general ones known, under which MDPs and two-player zero-sum stochastic games with Borel state and action spaces can be reduced to discounted ones. For undiscounted total-reward MDPs and stochastic games, the transformations we formulate are based on an idea due to Alan Hoffman. For average-rewards, the transformations are extensions to Borel state and action spaces of one proposed recently for finite stochastic games. In addition to implying the existence of ε-optimal policies for total and average rewards, these reductions lead to estimates of the number of arithmetic operations needed to compute optimal policies for such models with finite state and actions sets, as well as complexity estimates for computing ε-optimal policies for MDPs with Euclidean state and action spaces.
dcterms.abstractRecently there has been a resurgence of interest in the complexity of algorithms for Markov decision processes (MDPs) and stochastic games. Much of this work was inspired by recent groundbreaking results on the complexity of policy iteration algorithms for MDPs under the Blum-Shub-Smale (BSS) model of computation. In particular, for discounted MDPs with a fixed discount factor, Yinyu Ye showed that the number of arithmetic operations needed by two classic variants of policy iteration can be bounded above by a polynomial in the number of state-action pairs only. A natural question is whether a similar complexity estimate exists for the value iteration algorithm, which is another classic approach to computing optimal policies for MDPs. Our first main contribution is a negative answer to this question. Using a deterministic MDP with four state-action pairs, we show that under the BSS model there is no upper bound on the number of iterations needed by value iteration to return an optimal policy. We also show that the same example implies the same result for a broad class of so-called optimistic policy iteration algorithms, which includes algorithms of interest to the reinforcement learning community such as λ-policy iteration and modified policy iteration. Another natural question is whether Ye's approach can yield results for MDPs under other optimality criteria. Our second main contribution is a formulation of conditions, which to our knowledge are the most general ones known, under which MDPs and two-player zero-sum stochastic games with Borel state and action spaces can be reduced to discounted ones. For undiscounted total-reward MDPs and stochastic games, the transformations we formulate are based on an idea due to Alan Hoffman. For average-rewards, the transformations are extensions to Borel state and action spaces of one proposed recently for finite stochastic games. In addition to implying the existence of ε-optimal policies for total and average rewards, these reductions lead to estimates of the number of arithmetic operations needed to compute optimal policies for such models with finite state and actions sets, as well as complexity estimates for computing ε-optimal policies for MDPs with Euclidean state and action spaces.
dcterms.available2017-09-20T16:52:08Z
dcterms.contributorMitchell, Joseph S. B.en_US
dcterms.contributorFeinberg, Eugene A.en_US
dcterms.contributorHu, Jiaqiaoen_US
dcterms.contributorGao, Jie.en_US
dcterms.creatorHuang, Jefferson
dcterms.dateAccepted2017-09-20T16:52:08Z
dcterms.dateSubmitted2017-09-20T16:52:08Z
dcterms.descriptionDepartment of Applied Mathematics and Statisticsen_US
dcterms.extent125 pg.en_US
dcterms.formatApplication/PDFen_US
dcterms.formatMonograph
dcterms.identifierhttp://hdl.handle.net/11401/77166
dcterms.issued2016-12-01
dcterms.languageen_US
dcterms.provenanceMade available in DSpace on 2017-09-20T16:52:08Z (GMT). No. of bitstreams: 1 Huang_grad.sunysb_0771E_13001.pdf: 790262 bytes, checksum: a03a3259262f99b742f09c2bf2344f16 (MD5) Previous issue date: 1en
dcterms.publisherThe Graduate School, Stony Brook University: Stony Brook, NY.
dcterms.subjectalgorithm, complexity, control, Markov decision process, policy, stochastic game
dcterms.subjectOperations research -- Mathematics
dcterms.titleComplexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games
dcterms.typeDissertation


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record