In learning goal-directed behaviors, an agent has to consider not only the reward given at each state but also the consequences of dynamic state transitions associated with action selection. To understand brain mechanisms for action learning under predictable and unpredictable environmental dynamics, we measured brain activities by functional magnetic resonance imaging (fMRI) during a Markov decision task with predictable and unpredictable state transitions. Whereas the striatum and orbitofrontal cortex (OFC) were significantly activated both under predictable and unpredictable state transition rules, the dorsolateral prefrontal cortex (DLPFC) was more strongly activated under predictable than under unpredictable state transition rules. We then modelled subjects' choice behaviours using a reinforcement learning model and a Bayesian estimation framework and found that the subjects took larger temporal discount factors under predictable state transition rules. Model-based analysis of fMRI data revealed different engagement of striatum in reward prediction under different state transition dynamics. The ventral striatum was involved in reward prediction under both unpredictable and predictable state transition rules, although the dorsal striatum was dominantly involved in reward prediction under predictable rules. These results suggest different learning systems in the cortico-striatum loops depending on the dynamics of the environment: the OFC-ventral striatum loop is involved in action learning based on the present state, while the DLPFC-dorsal striatum loop is involved in action learning based on predictable future states.