Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 2;20(1):64.
doi: 10.1186/s13059-019-1660-0.

Molecular evolutionary trends and feeding ecology diversification in the Hemiptera, anchored by the milkweed bug genome

Kristen A Panfilio  1   2 Iris M Vargas Jentzsch  3 Joshua B Benoit  4 Deniz Erezyilmaz  5   6 Yuichiro Suzuki  7 Stefano Colella  8   9 Hugh M Robertson  10 Monica F Poelchau  11 Robert M Waterhouse  12   13 Panagiotis Ioannidis  12 Matthew T Weirauch  14 Daniel S T Hughes  15 Shwetha C Murali  15   16   17 John H Werren  18 Chris G C Jacobs  19   20 Elizabeth J Duncan  21   22 David Armisén  23 Barbara M I Vreede  24 Patrice Baa-Puyoulet  8 Chloé S Berger  23 Chun-Che Chang  25   26 Hsu Chao  15 Mei-Ju M Chen  11 Yen-Ta Chen  3 Christopher P Childers  11 Ariel D Chipman  24 Andrew G Cridge  21 Antonin J J Crumière  23 Peter K Dearden  21 Elise M Didion  4 Huyen Dinh  15 Harsha Vardhan Doddapaneni  15 Amanda Dolan  18   27 Shannon Dugan  15 Cassandra G Extavour  28   29 Gérard Febvay  8 Markus Friedrich  30 Neta Ginzburg  24 Yi Han  15 Peter Heger  31 Christopher J Holmes  4 Thorsten Horn  3 Yi-Min Hsiao  25   26 Emily C Jennings  4 J Spencer Johnston  32 Tamsin E Jones  28 Jeffery W Jones  30 Abderrahman Khila  23 Stefan Koelzer  3 Viera Kovacova  33 Megan Leask  21 Sandra L Lee  15 Chien-Yueh Lee  11 Mackenzie R Lovegrove  21 Hsiao-Ling Lu  25   26 Yong Lu  34 Patricia J Moore  35 Monica C Munoz-Torres  36 Donna M Muzny  15 Subba R Palli  37 Nicolas Parisot  8 Leslie Pick  34 Megan L Porter  38 Jiaxin Qu  15 Peter N Refki  23   39 Rose Richter  18   40 Rolando Rivera-Pomar  41 Andrew J Rosendale  4 Siegfried Roth  3 Lena Sachs  3 M Emília Santos  23 Jan Seibert  3 Essia Sghaier  23 Jayendra N Shukla  37   42 Richard J Stancliffe  43   44 Olivia Tidswell  21   45 Lucila Traverso  46 Maurijn van der Zee  19 Séverine Viala  23 Kim C Worley  15 Evgeny M Zdobnov  12 Richard A Gibbs  15 Stephen Richards  15
Affiliations

Molecular evolutionary trends and feeding ecology diversification in the Hemiptera, anchored by the milkweed bug genome

Kristen A Panfilio et al. Genome Biol. .

Abstract

Background: The Hemiptera (aphids, cicadas, and true bugs) are a key insect order, with high diversity for feeding ecology and excellent experimental tractability for molecular genetics. Building upon recent sequencing of hemipteran pests such as phloem-feeding aphids and blood-feeding bed bugs, we present the genome sequence and comparative analyses centered on the milkweed bug Oncopeltus fasciatus, a seed feeder of the family Lygaeidae.

Results: The 926-Mb Oncopeltus genome is well represented by the current assembly and official gene set. We use our genomic and RNA-seq data not only to characterize the protein-coding gene repertoire and perform isoform-specific RNAi, but also to elucidate patterns of molecular evolution and physiology. We find ongoing, lineage-specific expansion and diversification of repressive C2H2 zinc finger proteins. The discovery of intron gain and turnover specific to the Hemiptera also prompted the evaluation of lineage and genome size as predictors of gene structure evolution. Furthermore, we identify enzymatic gains and losses that correlate with feeding biology, particularly for reductions associated with derived, fluid nutrition feeding.

Conclusions: With the milkweed bug, we now have a critical mass of sequenced species for a hemimetabolous insect order and close outgroup to the Holometabola, substantially improving the diversity of insect genomics. We thereby define commonalities among the Hemiptera and delve into how hemipteran genomes reflect distinct feeding ecologies. Given Oncopeltus's strength as an experimental model, these new sequence resources bolster the foundation for molecular research and highlight technical considerations for the analysis of medium-sized invertebrate genomes.

Keywords: Evolution of development; Gene family evolution; Gene structure; Lateral gene transfer; Phytophagy; RNAi; Transcription factors.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The large milkweed bug, Oncopeltus fasciatus, shown in its phylogenetic and environmental context. a Species tree of selected Hemiptera with genomic and transcriptomic resources, based on phylogenetic analyses and divergence time estimates in [3]. Species marked with an asterisk (*) have published resources; those with the appellation “i5K” are part of a current pilot project supported by the Baylor College of Medicine Human Genome Sequencing Center and the National Agricultural Library of the USDA. Note that recent analyses suggest the traditional infraorder Cimicomorpha, to which Rhodnius and Cimex belong, may be paraphyletic [16]. b, c Milkweed bugs on their native food source, the milkweed plant: gregarious nymphs of different instars on a milkweed seed pod (b) and pale, recently eclosed adults and their shed exuvia (c). Images were taken at Avalon Park and Preserve, Stony Brook, NY, USA, courtesy of Deniz Erezyilmaz, used with permission. d Individual bugs, shown from left to right: first instar nymphs (ventral and dorsal views) and adults (dorsal and lateral views); images courtesy of Kristen Panfilio (nymphs) and Jena Johnson (adults), used with permission. The arrow labels the labium (the “straw”), part of the hemipteran mouthpart anatomy adapted for feeding by piercing and sucking
Fig. 2
Fig. 2
Comparisons of the official gene set and transcriptomic resources for Oncopeltus fasciatus. a Area-proportional Venn diagram comparing the OGS v1.1 (OGS), a Trinity de novo transcriptome from the three post-embryonic RNA-seq samples (i5K) and the maternal and embryonic transcriptome from 454 data (“454”, [35]). Sample sizes and the fraction of each transcriptome represented in the OGS are indicated (for the 454 dataset, only transcripts with homology identification were considered). The unique fraction of each set is also specified (%). Dataset overlaps were determined by blastn (best hit only, e value < 10−9). b Venn diagram of gene model expression support across four life history samples. Values are numbers of gene models, with percentages also given for the largest subsets. Note that the “Embryo/Maternal” sample derives from 454 pyrosequencing data and therefore has a smaller data volume than the other, Illumina-based samples. c Summary of sex- and developmental stage-specific RNA-seq comparisons across hemipteroid species: Apis, Acyrthosiphon pisum; Clec, Cimex lectularius; Focc, Frankliniella occidentalis (thysanopteran outgroup); Ofas, Oncopeltus fasciatus; Pven, Pachypsylla venusta; n.d., not determined. For complete numerical details, see Additional file 1: Supplemental Note 2.4. Analyses are based on OGS v1.1
Fig. 3
Fig. 3
Orthology comparisons and phylogenetic placement of Oncopeltus fasciatus among other Arthropoda. a Comparisons of protein-coding genes in 12 arthropod species, with the Hemiptera highlighted in red text. The bar chart shows the number of proteins per conservation level (see legend), based on OrthoDB orthology clustering analyses. To the left is a maximum likelihood phylogeny based on concatenation of 395 single-copy orthologs (all nodes have 100% support unless otherwise noted; branch length unit is substitutions per site). The inset pie chart shows the proportion of proteins per conservation level in Oncopeltus (Ofas). See also Additional file 1: Supplemental Note 6.1. b BUSCO-based analysis of Oncopeltus compared to other hemipterans for ortholog presence and copy number in both the assembly and OGS resources, using 4-letter species abbreviations (full names in a). c Proportion of Oncopeltus proteins that have expression and/or curation validation support per conservation level (same color legend as in a). Expression support is based on the life history stage data in Fig. 2b. Analyses are based on OGS v1.1
Fig. 4
Fig. 4
Distribution of transcription factor (TF) families across insect genomes. a Heatmap depicting the abundance of 74 TF families across 16 insect genomes (Hemiptera highlighted in red text), with Daphnia as an outgroup, based on the presence of predicted DNA binding domains (see the “Methods” section). The color key has a log (base 2) scale (light blue means the TF family is completely absent). Values are in Additional file 2: Table S6.3. b Bar graph showing the number of proteins of each of the 2 most abundant TF families, homeodomains and C2H2 zinc fingers (ZFs), per species using 4-letter abbreviations (full names in a). Solid lines demarcate insect orders: Hemiptera (Hemipt.), Hymenoptera (Hym.), Coleoptera (Col.), and Diptera (Dipt.). The dashed line demarcates the dipteran family Culicidae (mosquitoes). c Proportions of Oncopeltus homeodomain (HD) and C2H2 zinc finger proteins with orthology assignment (predicted DNA binding specificity) and/or manual curation. “Classified” refers to the automated classification of a protein to a TF family, but without a specific orthology assignment. d Maximum likelihood phylogeny of representative subsets of the zinc finger 271-like family in Oncopeltus (49 proteins, blue text) and the pea aphid (55 proteins, black text), with chelicerate (red text) and holometabolan (yellow text) outgroups (16 proteins, 7 species), based on the Oncopeltus OGS and GenBank protein accessions. Gaps were removed during sequence alignment curation; all nodes have ≥ 50% support; branch length unit is substitutions per site [157]. Key nodes are circled for the clades containing all aphid or all Oncopeltus proteins (82% support each), and each “core” clade comprised exclusively of proteins from each species (97% and 100%, respectively; triangles shown to scale for branch length and number of clade members). Branch length unit is substitutions per site. Analyses are based on OGS v1.1
Fig. 5
Fig. 5
Comparison of repeat content estimations. a Comparison of total repetitive content among insect genomes. The three values for Oncopeltus are shown (in ascending order: original Illumina assembly, gap-filled assembly, Illumina-PacBio hybrid estimate). Values for the three hemipterans labeled in red text are from RepeatModeler (gold bars for the pea aphid and bed bug; blue and gold bars for Oncopeltus). All other values are from the respective genome papers, including a second value corresponding to the published repeat content for the first version of the aphid genome [, , , –163]. Species abbreviations as in Fig. 4 and additionally Nlug, Nilaparvata lugens; Lmig, Locusta migratoria; Bmor, Bombyx mori; Aalb, Aedes albopictus. b Comparison of repetitive element categories between the three hemipteran genomes, based on results from RepeatModeler. Here, we present assembly coverage as actual sequence length (Mb) to emphasize the greater repeat content in Oncopeltus (based on the gap-filled assembly, see also Additional file 1: Supplemental Note 2.3)
Fig. 6
Fig. 6
Trends in gene structure show hemipteroid-specific tendencies. a Median values per species for protein size, exon size, and exon number for a curated set of highly conserved genes encoding large proteins of diverse functional classes (see also Additional file 1: Supplemental Note 6.3). Sample sizes are indicated, with 11 genes for which orthologs were evaluated in all species. Where it was not possible to analyze all 30 genes for a given species, equal sampling was done across the range of protein sizes of the complete dataset, based on the Cimex ortholog sizes (1:1:1 sampling from big-to-medium-to-small subcategories of 10 genes each). b Box plot representations of coding sequence exon size (aa) for 2 species from each of 3 insect orders, based on datasets of unique coding sequence exons (1 isoform per gene) and excluding terminal exons < 10 aa (as most of those exons may rather be UTRs or a small placeholder N-terminal exon based on automated Maker model predictions). Only manually curated gene models were considered for the i5K species, including Oncopeltus; the entire OGS was used for Tribolium and Drosophila. For clarity, outliers are omitted; whiskers represent 1.5× the value of the Q3 (upper) or Q2 (lower) quartile range. MAD, median absolute deviation. Species are represented by their 4-letter abbreviations, with their ordinal relationships given below the phylogeny in a: Hemip., Hemiptera; Thys., Thysanoptera; Col., Coleoptera; Dipt., Diptera. Species abbreviations as in Figs. 2 and 4 and additionally Gbue, Gerris buenoi [164]; Agla, Anoplophora glabripennis [30]; Ccap, Ceratitis capitata [165]
Fig. 7
Fig. 7
Splice site evolution correlates with both lineage and genome size. Splice site changes are shown for hemocytin (blue text), Tenascin major (Ten-m, turquoise text), and UDP-galactose 4′-epimerase (brown text), mapped onto a species tree of eight insects. Patterns of splice site evolution were inferred based on the most parsimonious changes that could generate the given pattern within a protein sequence alignment of all orthologs (see also Additional file 1: Supplemental Note 6.3 for methodology and data sources). If inferred gains or losses were equally parsimonious, we remained agnostic and present a range for the ancestral number of splice sites present at the base of the tree, where the bracketed number indicates how many ancestral positions are still retained in all species. Along each lineage, subsequent changes are indicated in brackets, with the sign indicating gains (+) or losses (−). Values shown to the right are species-specific changes. The values shown between the D. melanogaster and T. castaneum lineages denote changes that have occurred independently in both species. Colored boxes highlight the largest sources of change, as indicated in the legend. Species are represented by their four-letter abbreviations (as in Fig. 6), and estimated genome sizes are indicated parenthetically (measured size [12, 30, 162, 165, 166]; draft assembly size: GenBank Genome IDs 14741 and 17730). Divergence times are shown in gray and given in millions of years [3]. Abbreviations as in Figs. 4 and 6, and also: Hemipt., hemipteroid assemblage (including F. occidentalis); n.d., no data
Fig. 8
Fig. 8
Lateral gene transfer introduction and subsequent evolution within the Hemiptera for mannosidase-encoding genes. a Species tree summary of evolutionary events. Stars represent the original LGT introduction and subsequent copy number gains (see legend). b Maximum likelihood phylogeny of mannosidase proteins, including bacterial sequences identified among the best GenBank blastp hits for Oncopeltus and Halyomorpha (accession numbers as indicated, and for “Other bacteria” are ACB22214.1, AEE17431.1, AEI12929.1, AEO43249.1, AFN74531.1, CDM56239.1, CUA67033.1, KOE98396.1, KPI24888.1, OAN41395.1, ODP26899.1, ODS11151.1, OON18663.1, PBD05534.1, SIR54690.1, WP096035621.1, YP001327394.1). All nodes have ≥ 50% support from 500 bootstrap replicates [167]. Triangles are shown to scale for branch length and number of clade members; branch length unit is substitutions per site. See also Additional file 1: Figure S2.6. c Manually curated protein sequence alignment for the N-terminal region only. Splice sites (“|” symbol) are shown, where one position is ancestral and present in all paralogs of a given species (magenta) and one position occurs in a subset of paralogs and is presumed to be younger (cyan, within the 5′ UTR in Halyomorpha). Residues highlighted in yellow are conserved between the two hemipteran species. The Oncopeltus paralog represented in the OGS as OFAS017153-RA is marked with an asterisk to indicate that this version of the gene model is incomplete and lacks the initial exon (gray text in the alignment). For clarity, only the final three digits of the Halyomorpha GenBank accessions are shown (full accessions: XP_014289XXX)
Fig. 9
Fig. 9
Isoform-specific RNAi based on new genome annotations affects the molting and cuticle identity gene broad. a Genomic organization of the cuticle identity gene broad. The regions used as a template to generate isoform-specific dsRNA are indicated (red asterisks: the final, unique exons of each isoform). Previous RNAi studies targeted sequence within exons 1–5 that is shared among all isoforms (dashed red box, [92]). b Knockdown of the Oncopeltus Z2 or Z3 broad isoforms at the onset of the penultimate instar resulted in altered nymphal survival and morphogenesis that was reflected in the size and proportion of the fore and hind wings at the adult stage (upper and lower images, respectively, shown to the same scale for all wings). We did not detect any effect on the wing phenotype when targeting the Z4-specific exon, demonstrating the specificity of the zinc finger coding region targeted by RNAi. Experimental statistics are provided in the figure inset, including for the buffer-injected negative control
Fig. 10
Fig. 10
Comparison of the urea cycle of Oncopeltus with 26 other insect species. a Detailed diagram of the urea cycle (adapted from KEGG). b Group of 7 species, including Oncopeltus, for which Arg degradation via arginase (3.5.3.1), but not synthesis, is possible. c Group of 3 species for which neither the degradation nor synthesis of arginine via the urea cycle is possible (the 3 other hemipterans in this analysis). d Group of 17 species sharing a complete (or almost complete) urea cycle. Hemiptera are identified in red text, and the milkweed-feeding monarch butterfly is in blue text. Enzyme names corresponding to EC numbers: 1.5.1.2 = pyrroline-5-carboxylate reductase, 1.14.13.39 = nitric-oxide synthase, 2.1.3.3 = ornithine carbamoyltransferase, 2.6.1.13 = ornithine aminotransferase, 3.5.3.1 = arginase, 4.3.2.1 = argininosuccinate lyase, 6.3.4.5 = argininosuccinate synthase. Analyses are based on OGS v1.1

Similar articles

Cited by

References

    1. Zdobnov EM, Tegenfeldt F, Kuznetsov D, Waterhouse RM, Simão FA, Ioannidis P, Seppey M, Loetscher A, Kriventseva EV. OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs. Nucleic Acids Res. 2017;45:D744–D749. - PMC - PubMed
    1. Huang DY, Bechly G, Nel P, Engel MS, Prokop J, Azar D, Cai CY, van de Kamp T, Staniczek AH, Garrouste R, et al. New fossil insect order Permopsocida elucidates major radiation and evolution of suction feeding in hemimetabolous insects (Hexapoda: Acercaria) Sci Rep. 2016;6:23004. - PMC - PubMed
    1. Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, Frandsen PB, Ware J, Flouri T, Beutel RG, et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346:763–767. - PubMed
    1. Grimaldi D, Engel MS. Evolution of the insects. Cambridge: Cambridge University Press; 2005.
    1. Panfilio KA, Angelini DR. By land, air, and sea: hemipteran diversity through the genomic lens. Curr Opin Insect Sci. 2018;25:106–115. - PubMed

Publication types

Substances