Enhancer-promoter interactions (EPIs) play an important role in gene regulation, yet experimental mapping remains costly and limited in coverage. As a result, computational approaches are commonly evaluated under curated benchmark datasets, which pose challenges related to long-range sequence modeling, multimodal feature integration, and reproducible preprocessing. In this study, we present EPINTLM (Enhancer-Promoter Interaction Nucleotide Transformer Large Model), a deep learning framework designed to investigate architectural strategies for EPI prediction under standardized benchmark settings. EPINTLM integrates DNA sequence representations and genomic features by leveraging pretrained k-mer embeddings from the Nucleotide Transformer and explicitly modeling intra- and inter-sequence dependencies through residual self-attention and bidirectional cross-attention. We additionally introduce a unified preprocessing pipeline to improve training efficiency and reproducibility, and perform post hoc motif analysis to provide limited interpretability of learned sequence patterns. Evaluated on a widely used benchmark across six human cell lines, EPINTLM achieves competitive area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPR) performance relative to existing methods, with ablation studies highlighting the contributions of cross-attention and residual aggregation. These results demonstrate the utility of explicit cross-attention designs for paired regulatory sequence modeling within current benchmark constraints.
Keywords: cross-attention mechanism; enhancer–promoter interaction; genomic sequence; multimodal feature; pretrained large language models; residual connection.
© The Author(s) 2026. Published by Oxford University Press.