Animals seem to learn and use internal models; they learn to anticipate predictable events, and their behavior in the sensory preconditioning paradigm reflects formation of novel associative chains. To investigate possible neural correlates, the temporal difference model (TD model) was extended to an internal model approach. The proposed model learns reward prediction error signals that resemble dopamine neuron activity. In contrast to the original TD model, the reward prediction error signals of the proposed model are influenced by the formation of novel associative chains in the sensory preconditioning experiment. This is consistent with experimental findings, as striatal dopamine concentration is influenced by the formation of novel associative chains in this paradigm. Comparison of the model architecture with biological neural networks suggests that chains of neurons with tonic anticipatory activity may underlie the formation of novel associative chains. These findings suggest that dopamine neuron activity may reflect the processing of an internal model.