RNA-Sequencing (RNA-Seq) represents a powerful approach for discovering SNPs in coding regions (cSNPs) which can alter the amino acid sequence of the encoded proteins and have predicted deleterious effects in proteins, underlying disease susceptibility or resistance. RNA-Seq data from peripheral blood (PB) and ileocecal valve (ICV) samples collected from fourteen Holstein cattle with focal (N = 5) and diffuse (N = 5) paratuberculosis (PTB)-associated lesions and without lesions (N = 4) in gut tissues was used to identify deleterious cSNPs that were unique to each group of animals. PB and ICV samples from each animal were subjected to RNA extraction, library preparation, and paired-end RNA-Sequencing (RNA-Seq). The RNA-Seq reads were aligned against the bovine ARS-UCD1.2.109 reference genome using the STAR aligner generating an average of 21,331,835 and 19,506,829 uniquely mapped reads in the PB and ICV samples, respectively. SNP calling was performed on the RNA-Seq data of each group of animals using bcftools v1.11. To ensure high-confidence cSNP calls, highly stringent SNP filtering criteria were applied: minimum read depth (≥ 10), supporting reads for alternative allele (≥ 4), Phred score of the alternative allele (≥ 30), minor allele frequency (> 20%), maximum proportion of missing data per site (< 80%), and distance from indels (SNPs within 5 bp of insertions/deletions were excluded). From the 856, 625, and 603 identified cSNPs that were uniquely present in the transcriptome of the control cows and cows with focal and diffuse lesions, 31, 15, and 31 variants had predicted deleterious effects, respectively. The major histocompatibility complex II gene (BOLA) was the only candidate gene affected by different predicted deleterious cSNPs in the three groups of animals. Using the candidate genes, gene set enrichment analysis (GSEA) revealed distinct biological processes and metabolic pathways associated with each group of cows. Cows without lesions showed enrichment in 11 GO terms and 6 metabolic pathways, particularly involving BOLA, AP3B1, and CHGA genes. These leading-edge genes are linked to antigen processing and presentation, phagosome maturation, lysosome function, and intestinal immune homeostasis. Cows with focal lesions had enrichment in the negative regulation of apoptosis and cellular metabolism with two leading-edge genes, ORMD1 and KANK2. Predicted deleterious cSNPs in these leading-edge genes may help the host modulate immune responses and maintain low bacterial load during the subclinical stage of MAP infection. Finally, cows with diffuse lesions showed enrichment in 27 metabolic pathways, including Th1/Th2 cell differentiation, antigen presentation, bile secretion, and antifolate resistance. Further validation of the cSNPs and candidate genes in additional independent populations may lead to their use in SNP-based selection strategies for increasing resistance to MAP infection.
Supplementary Information: The online version contains supplementary material available at 10.1038/s41598-026-37675-9.
Keywords: Deleterious variants; Disease resistance; Paratuberculosis; RNA-Seq; SNP calling; Single nucleotide polymorphisms.