Dictionary Multi-Modal Temporal Graph Learning

IEEE Trans Pattern Anal Mach Intell. 2026 Mar 31:PP. doi: 10.1109/TPAMI.2026.3679419. Online ahead of print.

Abstract

Temporal graph learning focuses on graph deep learning in real-world dynamic scenarios, which uses interaction sequence instead of adjacency matrix to observe the graph dynamic changes more microscopically from the perspective of time evolution. However, current temporal graph methods only focus on extra dynamic information, ignoring the large amount of multi-modal information contained in the real world. These information can reflect the rich changes in the real world from different perspectives. Ignoring them means that temporal graph learning still lacks the ability to restore and mine more complex real-world data. We argue that the main challenges causing the above phenomenon in temporal graph learning are the lack of multi-modal architecture and public multi-modal datasets. To solve the above challenges, we propose ModalTGL, which enhances the computational efficiency of the model in complex dynamic scenarios by introducing the dictionary graph network, and achieves multi-modal fusion by embedding tuning. In addition, we also discuss the effects of different time encoding functions on dynamic information preservation. At the data level, we build several multi-modal temporal graph datasets from different areas, and compare with multiple SOTA methods on these datasets. The experimental results verify the effectiveness of the ModalTGL method, achieving the performance improvement of up to 18.48%. Code and data can be obtained from https://github.com/MGitHubL/ModalTGL.