Drug-target affinity (DTA) prediction is crucial in drug discovery. It enables researchers to elucidate the complex interaction mechanisms between candidate drugs and biological targets. However, current methods have limitations in capturing global structural patterns from molecular graphs, which are essential for accurate characterization of drugs and proteins. The absence of three-dimensional (3D) structural data leads to the loss of molecular structural information, which impairs model accuracy and generalizability. To resolve these issues, we propose a multimodal framework, PMHGT-DTA, to predict DTA using pretrained models and a hierarchical graph transformer (HGT). It integrates graph neural networks (GNNs) with transformers to represent both local node features and global structural information on molecular graphs. Both 3D conformation drug graphs and binding site-focused protein graphs, derived from pretrained models, are incorporated to complement sequence modality features. In addition, the cross-attention module models the interactions between drug atoms and protein amino acid residues to establish drug-target relationships and thereby enhancing the interpretability of the model. Experiments on Davis and KIBA benchmark data sets show that PMHGT-DTA outperforms baselines in both standard and real-world scenarios, demonstrating its potential to accelerate drug development.