Glaucoma is the leading reason for irreversible blindness. Early detection and timely treatment of glaucoma are essential for preventing visual field loss or even blindness. In clinical practice, Optical Coherence Tomography (OCT) and Visual Field (VF) exams are two widely-used and complementary techniques for diagnosing glaucoma. OCT provides quantitative measurements of the optic nerve head (ONH) structure, while VF test is the functional assessment of peripheral vision. In this paper, we propose a Deep Relation Transformer (DRT) to perform glaucoma diagnosis with OCT and VF information combined. A novel deep reasoning mechanism is proposed to explore implicit pairwise relations between OCT and VF information in global and regional manners. With the pairwise relations, a carefully-designed deep transformer mechanism is developed to enhance the representation with complementary information for each modal. Based on reasoning and transformer mechanisms, three successive modules are designed to extract and collect valuable information for glaucoma diagnosis, the global relation module, the guided regional relation module, and the interaction transformer module, namely. Moreover, we build a large dataset, namely ZOC-OCT&VF dataset, which includes 1395 OCT-VF pairs for developing and evaluating our DRT. We conduct extensive experiments to validate the effectiveness of the proposed method. Experimental results show that our method achieves 88.3% accuracy and outperforms the existing single-modal approaches with a large margin. The codes and dataset will be publicly available in the future.