Survival analysis is a significant study in cancer prognosis, and the multi-modal data, including histopathological images, genomic data, and clinical information, provides unprecedented opportunities for its development. However, because of the high dimensionality and the heterogeneity of histopathological images and genomic data, acquiring effective predictive characters from these multi-modal data has always been a challenge for survival analysis. In this study, we propose a transformer-based survival analysis model (TransSurv) for colorectal cancer that can effectively integrate intra-modality and inter-modality features of histopathological images, genomic data, and clinical information. Specifically, to integrate the intra-modality relationship of image patches, we develop a multi-scale histopathological features fusion transformer (MS-Trans). Furthermore, we provide a cross-modal fusion transformer based on cross attention for multi-scale pathological representation and multi-omics representation, which includes RNA-seq expression and copy number alteration (CNA). At the output layer of the TransSurv, we adopt the Cox layer to integrate multi-modal fusion representation with clinical information for end-to-end survival analysis. The experimental results on the Cancer Genome Atlas (TCGA) colorectal cancer cohort demonstrate that the proposed TransSurv outperforms the existing methods and improves the prognosis prediction of colorectal cancer.