Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 30;21(1):259.
doi: 10.1186/s12864-020-6672-3.

De Novo Assembly of the Olive Fruit Fly (Bactrocera Oleae) Genome With Linked-Reads and Long-Read Technologies Minimizes Gaps and Provides Exceptional Y Chromosome Assembly

Affiliations
Free PMC article

De Novo Assembly of the Olive Fruit Fly (Bactrocera Oleae) Genome With Linked-Reads and Long-Read Technologies Minimizes Gaps and Provides Exceptional Y Chromosome Assembly

Anthony Bayega et al. BMC Genomics. .
Free PMC article

Abstract

Background: The olive fruit fly, Bactrocera oleae, is the most important pest in the olive fruit agribusiness industry. This is because female flies lay their eggs in the unripe fruits and upon hatching the larvae feed on the fruits thus destroying them. The lack of a high-quality genome and other genomic and transcriptomic data has hindered progress in understanding the fly's biology and proposing alternative control methods to pesticide use.

Results: Genomic DNA was sequenced from male and female Demokritos strain flies, maintained in the laboratory for over 45 years. We used short-, mate-pair-, and long-read sequencing technologies to generate a combined male-female genome assembly (GenBank accession GCA_001188975.2). Genomic DNA sequencing from male insects using 10x Genomics linked-reads technology followed by mate-pair and long-read scaffolding and gap-closing generated a highly contiguous 489 Mb genome with a scaffold N50 of 4.69 Mb and L50 of 30 scaffolds (GenBank accession GCA_001188975.4). RNA-seq data generated from 12 tissues and/or developmental stages allowed for genome annotation. Short reads from both males and females and the chromosome quotient method enabled identification of Y-chromosome scaffolds which were extensively validated by PCR.

Conclusions: The high-quality genome generated represents a critical tool in olive fruit fly research. We provide an extensive RNA-seq data set, and genome annotation, critical towards gaining an insight into the biology of the olive fruit fly. In addition, elucidation of Y-chromosome sequences will advance our understanding of the Y-chromosome's organization, function and evolution and is poised to provide avenues for sterile insect technique approaches.

Keywords: Bactrocera oleae; Insect developmental genes; Linked reads; Long reads; Olive fruit fly genome; Y chromosome assembly.

Conflict of interest statement

JR is a member of the MinION Access Program (MAP) and has received free-of-charge flow cells and sequencing kits from Oxford Nanopore Technologies for other projects. JR has had no other financial support from ONT. AB has received re-imbursement for travel costs associated with attending the Nanopore Community meeting 2018, a meeting organized by Oxford Nanopore Technologies. KG and DMC held positions as employees with 10x Genomics (Pleasanton, California, USA).

Figures

Fig. 1
Fig. 1
Schematic of the method used to generate the different assemblies. DNA extracted from adult female and/or male insects was used to generate sequencing libraries for; Illumina paired-end (PE, 64X and 6X coverage, respectively), mate-pair (MP, 100X coverage), 10x Genomics linked-reads (100X coverage generated but 74X was found optimal for genome assembly), Pacific Biosciences (PacBio, 20X coverage), and Oxford Nanopore Technologies (ONT, 28X coverage). Independently generated assemblies are shown, and assemblies generated from scaffolding and gap scaffolding are shown with their GenBank accession numbers. Arrows indicate the final resulting assemblies while arrow heads indicate the samples or datasets used to generate the final assemblies
Fig. 2
Fig. 2
Optimization of number of partitions and coverage for the Supernova assembler. Different number of partitions were randomly selected using the partition (GEM) barcodes while also varying the number of reads per partition to optimize the coverage. These were provided as input for the assembler. For each resulting assembly the NG50 length and LG50 count were calculated with genome size assumed to be 320 Mb [23]. NG50 value is the scaffold/contig length at which half of the genome (~ 160 Mb) is contained in scaffolds/contigs at or above that length. LG50 is the number of contigs needed to reach N50. Arrow heads indicate optimized parameters
Fig. 3
Fig. 3
B. oleae polytene chromosomes mapping of molecular markers Y chromosome assembly. a Plot showing Y chromosome scaffolds/contigs identified in 2 different assemblies (Supplementary Table S4). The Chromosome Quotient (CQ) method [49] was used to identify Y chromosome scaffolds. The scaffolds/contigs are ordered from longest at the bottom to shortest at the top. For each assembly the total scaffolds/contigs are shown in left bars while the PCR validated scaffolds/contigs are the right bars. The approximate location of the PCR primer on the scaffold/contig is shown in pink. b Schematic representation of B. oleae polytene chromosomes including all mapped markers (tags) and the scaffolds assigned to chromosomes. Previously and currently mapped markers are indicated with black and red letters, respectively, above chromosomes. Colored horizontal bars above chromosomes indicate scaffolds/contigs in the GCA_001188975.4 assembly that were localized to chromosomes using mapped markers. More than one tags on a specific scaffold is informative of its physical orientation. m## corresponds to microsatellite markers number ##; c## corresponds to EST marker number ## [26, 50]; newly mapped genes in the current study are presented in full names or abbreviations (Supplementary Table S6); “*” indicates the tags that were not found on the anchored contig or gave ambiguous alignment results. The centromere is shown as a filled circle. (see Supplementary Table S7 for detailed information)
Fig. 4
Fig. 4
Phylogenetic relationship of Bactrocera oleae (olive fruit fly) and 18 other arthropods. Whole proteomes were used to infer pairwise distances of the 19 species using Prot-SpaM [84]. A phylogenetic tree was generated using Neighbor-Joining algorithm [84] implemented in T-REX [85] and viewed using iTOL [86]. See Supplementary Table S16 for sources of the proteomes used
Fig. 5
Fig. 5
Venn diagram of shared orthogroups among B. oleae, C. capitata, D. melanogaster, and M. domestica. Orthologous proteins were identified and grouped using OrthoFinder [87]. Shared and unique orthogroups are plotted using Jvenn [88]. See Supplementary Table S16 for sources of the proteomes used
Fig. 6
Fig. 6
Principle component analysis (PCA) of 1100 most variable genes among the 4 metamorphotic stages. Gene expression (transcripts per million) was calculated for each of the stages; egg, larvae, pupae, and adult using RSEM [94]. A coefficient of variation was determined for each gene and used to determine the most variable genes. Eigenvector coordinates for the stages (egg, larvae, pupae, and adult) on the first 2 components are shown in red. Coordinates of the individual genes on the first 2 principle components (circle of correlation) are shown as black dots
Fig. 7
Fig. 7
Drichlet process Gaussian process (DPGP) [92] modeling and clustering of gene expression. Gene expression (transcripts per million) was calculated for each of the 4 metamorphotic stages; egg, larvae, pupae, and adult using RSEM [94] and the expression matrix used to determine genes that only peak at the corresponding stage

Similar articles

See all similar articles

References

    1. Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. doi: 10.1126/science.287.5461.2185. - DOI - PubMed
    1. Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002;298:129–149. doi: 10.1126/science.1076181. - DOI - PubMed
    1. i5K Consortium The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered. 2013;104:595–600. doi: 10.1093/jhered/est050. - DOI - PMC - PubMed
    1. Poelchau M, Childers C, Moore G, Tsavatapalli V, Evans J, Lee CY, Lin H, Lin JW, Hackett K. The i5k workspace@NAL--enabling genomic data access, visualization and curation of arthropod genomes. Nucleic Acids Res. 2015;43:D714–D719. doi: 10.1093/nar/gku983. - DOI - PMC - PubMed
    1. Li F, Zhao X, Li M, He K, Huang C, Zhou Y, Li Z, Walters JR. Insect genomes: progress and challenges. Insect Mol Biol. 2019;28(6):739–58. https://www.ncbi.nlm.nih.gov/pubmed/31120160. - PubMed

LinkOut - more resources

Feedback