We extend our static multimodal nonrigid registration to a spatio-temporal (2D+T) co-registration of a real-time 3D ultrasound and a cardiovascular MR sequence. The motivation for our research is to assist a clinician to automatically fuse the information from multiple imaging modalities for the early diagnosis and therapy of cardiac disease. The deformation field between both sequences is decoupled into spatial and temporal components. Temporal alignment is firstly performed to re-slice both sequences using a differential registration method. Spatial alignment is then carried out between the frames corresponding to the same temporal position. The spatial deformation is modeled by the polyaffine transformation whose anchor points (or control points) are automatically detected and refined by calculating a local mis-match measure based on phase mutual information. The spatial alignment is built in an adaptive multi-scale framework to maximize the phase-based similarity measure by optimizing the parameters of the polyaffine transformation. Results demonstrate that this novel method can yield an accurate registration to particular cardiac regions.