You are here

A POMDP Approximation Algorithm that Anticipates the Need to Observe

TitleA POMDP Approximation Algorithm that Anticipates the Need to Observe
Publication TypeConference Paper
Year of Publication2000
AuthorsZubek, V B., and T. G. Dietterich
Conference NameIn Springer-Verlag (Ed.), Proceedings of the Paci Rim Conference on Arti Intelligence (PRICAI); Lecture Notes in Computer Science
Date Published08/2000
Conference LocationMelbourne, Australia

This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V 2MDP, can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP's optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method finds a good policy for a POMDP with 10,000 states and observations.