In many Reinforcement Learning (RL) domains there is a high cost for generating experience in order to evaluate an agent’s performance. An appealing approach to reducing the number of expensive evaluations is Bayesian Optimization (BO), which is a framework for global optimization of noisy and costly to evaluate functions. Prior work in a number of RL domains has demonstrated the effectiveness of BO for optimizing parametric policies. However, those approaches completely ignore the state-transition sequence of policy executions and only consider the total reward achieved. In this paper, we study how to more effectively incorporate all of the information observed during policy executions into the BO framework. In particular, our approach uses the observed data to learn approximate transitions models that allow for Monte-Carlo predictions of policy returns. The models are then incorporated into the BO framework as a type of prior on policy returns, which can better inform the BO process. The resulting algorithm provides a new approach for leveraging learned models in RL even when there is no planner available for exploiting those models. We demonstrate the effectiveness of our algorithm in four benchmark domains, which have dynamics of variable complexity. Results indicate that our algorithm effectively combines model based predictions to improve the data efficiency of model free BO methods, and is robust to modeling errors when parts of the domain cannot be modeled successfully.