Graph-based models of the Oenothera mitochondrial genome capture the enormous complexity of higher plant mitochondrial DNA organization

NAR Genom Bioinform. 2022 Mar 31;4(2):lqac027. doi: 10.1093/nargab/lqac027. eCollection 2022 Jun.

Abstract

Plant mitochondrial genomes display an enormous structural complexity, as recombining repeat-pairs lead to the generation of various sub-genomic molecules, rendering these genomes extremely challenging to assemble. We present a novel bioinformatic data-processing pipeline called SAGBAC (Semi-Automated Graph-Based Assembly Curator) that identifies recombinogenic repeat-pairs and reconstructs plant mitochondrial genomes. SAGBAC processes assembly outputs and applies our novel ISEIS (Iterative Sequence Ends Identity Search) algorithm to obtain a graph-based visualization. We applied this approach to three mitochondrial genomes of evening primrose (Oenothera), a plant genus used for cytoplasmic genetics studies. All identified repeat pairs were found to be flanked by two alternative and unique sequence-contigs defining so-called 'double forks', resulting in four possible contig-repeat-contig combinations for each repeat pair. Based on the inferred structural models, the stoichiometry of the different contig-repeat-contig combinations was analyzed using Illumina mate-pair and PacBio RSII data. This uncovered a remarkable structural diversity of the three closely related mitochondrial genomes, as well as substantial phylogenetic variation of the underlying repeats. Our model allows predicting all recombination events and, thus, all possible sub-genomes. In future work, the proposed methodology may prove useful for the investigation of the sub-genome organization and dynamics in different tissues and at various developmental stages.