Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;587(7833):252-257.
doi: 10.1038/s41586-020-2873-9. Epub 2020 Nov 11.

Dense sampling of bird diversity increases power of comparative genomics

Shaohong Feng #  1   2   3 Josefin Stiller #  4 Yuan Deng #  1   3   4 Joel Armstrong #  5 Qi Fang  1   3   4 Andrew Hart Reeve  6 Duo Xie  1   3   7 Guangji Chen  1   3   7 Chunxue Guo  1   3 Brant C Faircloth  8   9 Bent Petersen  10   11 Zongji Wang  1   3   12   13 Qi Zhou  12   13   14 Mark Diekhans  5 Wanjun Chen  1   3 Sergio Andreu-Sánchez  4 Ashot Margaryan  11   15 Jason Travis Howard  16 Carole Parent  17 George Pacheco  11 Mikkel-Holger S Sinding  11 Lara Puetz  11 Emily Cavill  11 Ângela M Ribeiro  6 Leopold Eckhart  18 Jon Fjeldså  6   19 Peter A Hosner  6   19 Robb T Brumfield  8   9 Les Christidis  20 Mads F Bertelsen  21 Thomas Sicheritz-Ponten  10   11 Dieter Thomas Tietze  22 Bruce C Robertson  23 Gang Song  24   25 Gerald Borgia  26 Santiago Claramunt  27   28 Irby J Lovette  29 Saul J Cowen  30 Peter Njoroge  31 John Philip Dumbacher  32 Oliver A Ryder  33   34 Jérôme Fuchs  35 Michael Bunce  36 David W Burt  37 Joel Cracraft  38 Guanliang Meng  1 Shannon J Hackett  39 Peter G Ryan  40 Knud Andreas Jønsson  6 Ian G Jamieson  23 Rute R da Fonseca  19 Edward L Braun  41 Peter Houde  42 Siavash Mirarab  43 Alexander Suh  44   45   46 Bengt Hansson  47 Suvi Ponnikas  47 Hanna Sigeman  47 Martin Stervander  47   48 Paul B Frandsen  49   50 Henriette van der Zwan  51 Rencia van der Sluis  51 Carina Visser  52 Christopher N Balakrishnan  53 Andrew G Clark  54 John W Fitzpatrick  29 Reed Bowman  55 Nancy Chen  56 Alison Cloutier  57   58 Timothy B Sackton  59 Scott V Edwards  57   58 Dustin J Foote  53   60 Subir B Shakya  8   9 Frederick H Sheldon  8   9 Alain Vignal  61 André E R Soares  62   63 Beth Shapiro  63   64 Jacob González-Solís  65   66 Joan Ferrer-Obiol  65   67 Julio Rozas  65   67 Marta Riutort  65   67 Anna Tigano  68   69 Vicki Friesen  69 Love Dalén  70   71 Araxi O Urrutia  72   73 Tamás Székely  72 Yang Liu  74 Michael G Campana  75 André Corvelo  76 Robert C Fleischer  75 Kim M Rutherford  77 Neil J Gemmell  77 Nicolas Dussex  70   71   77 Henrik Mouritsen  78 Nadine Thiele  78 Kira Delmore  79   80 Miriam Liedvogel  80 Andre Franke  81 Marc P Hoeppner  81 Oliver Krone  82 Adam M Fudickar  83 Borja Milá  84 Ellen D Ketterson  85 Andrew Eric Fidler  86 Guillermo Friis  87 Ángela M Parody-Merino  88 Phil F Battley  88 Murray P Cox  89 Nicholas Costa Barroso Lima  62   90 Francisco Prosdocimi  91 Thomas Lee Parchman  92 Barney A Schlinger  93   94 Bette A Loiselle  95   96 John G Blake  95 Haw Chuan Lim  75   97 Lainy B Day  98 Matthew J Fuxjager  99 Maude W Baldwin  100 Michael J Braun  101   102 Morgan Wirthlin  103 Rebecca B Dikow  50 T Brandt Ryder  104 Glauco Camenisch  105 Lukas F Keller  105 Jeffrey M DaCosta  106 Mark E Hauber  107 Matthew I M Louder  53   107   108 Christopher C Witt  109 Jimmy A McGuire  110 Joann Mudge  111 Libby C Megna  112 Matthew D Carling  112 Biao Wang  113 Scott A Taylor  114 Glaucia Del-Rio  9 Alexandre Aleixo  115 Ana Tereza Ribeiro Vasconcelos  62 Claudio V Mello  116 Jason T Weir  27   28   117 David Haussler  5 Qiye Li  1   3 Huanming Yang  3   118 Jian Wang  3 Fumin Lei  24   119 Carsten Rahbek  19   120   121   122 M Thomas P Gilbert  11   123 Gary R Graves  19   101 Erich D Jarvis  17   124   125 Benedict Paten  126 Guojie Zhang  127   128   129   130
Affiliations

Dense sampling of bird diversity increases power of comparative genomics

Shaohong Feng et al. Nature. 2020 Nov.

Erratum in

  • Author Correction: Dense sampling of bird diversity increases power of comparative genomics.
    Feng S, Stiller J, Deng Y, Armstrong J, Fang Q, Reeve AH, Xie D, Chen G, Guo C, Faircloth BC, Petersen B, Wang Z, Zhou Q, Diekhans M, Chen W, Andreu-Sánchez S, Margaryan A, Howard JT, Parent C, Pacheco G, Sinding MS, Puetz L, Cavill E, Ribeiro ÂM, Eckhart L, Fjeldså J, Hosner PA, Brumfield RT, Christidis L, Bertelsen MF, Sicheritz-Ponten T, Tietze DT, Robertson BC, Song G, Borgia G, Claramunt S, Lovette IJ, Cowen SJ, Njoroge P, Dumbacher JP, Ryder OA, Fuchs J, Bunce M, Burt DW, Cracraft J, Meng G, Hackett SJ, Ryan PG, Jønsson KA, Jamieson IG, da Fonseca RR, Braun EL, Houde P, Mirarab S, Suh A, Hansson B, Ponnikas S, Sigeman H, Stervander M, Frandsen PB, van der Zwan H, van der Sluis R, Visser C, Balakrishnan CN, Clark AG, Fitzpatrick JW, Bowman R, Chen N, Cloutier A, Sackton TB, Edwards SV, Foote DJ, Shakya SB, Sheldon FH, Vignal A, Soares AER, Shapiro B, González-Solís J, Ferrer-Obiol J, Rozas J, Riutort M, Tigano A, Friesen V, Dalén L, Urrutia AO, Székely T, Liu Y, Campana MG, Corvelo A, Fleischer RC, Rutherford KM, Gemmell NJ, Dussex N, Mouritsen H, Thiele N, Delmore K, Liedvogel M, Franke A, Hoeppner MP, Krone O, Fudickar AM, Milá B, Ketterson ED, Fidler AE, Friis G, Parody-Me… See abstract for full author list ➔ Feng S, et al. Nature. 2021 Apr;592(7856):E24. doi: 10.1038/s41586-021-03473-8. Nature. 2021. PMID: 33833441 Free PMC article. No abstract available.

Abstract

Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1-4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families-including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Newly sequenced genomes densely cover the bird tree of life.
The 10,135 bird species, are shown on a draft phylogeny that synthesizes taxonomic and phylogenetic information (Supplementary Data). In total, 363 species, covering 92.4% of all families, now have at least 1 genome assembly per sequenced family (purple branches). The grey arc marks the diverse Passeriformes radiation, with 6,063 species, of which 173 species have genome assemblies now. Chicken (*) and zebra finch (**) are marked for orientation. Paintings illustrate examples of sequenced species.
Fig. 2
Fig. 2. Improved orthologue distinction and detection of lineage-specific sequences.
a, Incorporating synteny in the orthologue assignment pipeline resolves complex cases of orthology. The growth hormone gene (GH) has one copy in chicken and two copies in Passeriformes (exemplified by zebra finch and Atlantic canary). On the basis of the conserved synteny of the GH_L in Passeriformes with GH of chicken, the pipeline recognized GH_L as the ancestral copy—despite high similarity to the other copy. b, The whole-genome alignment allows detecting lineage-specific sequences. For orders with more than one sequenced representative, lineage-specific sequences are those present in the reconstructed ancestral genome but absent in other lineages. Colours denote higher-level taxonomic groupings. c, A novel gene in Passeriformes. Phylogeny based on the B10K Project phase I plotted with synteny of a putative lineage-specific gene (DNAJC15L) and its surrounding genes. DNAJC15L is found in 131 out of 173 sequenced Passeriformes and their reconstructed ancestral genome, but is not found in non-Passeriformes. MRCA, most-recent common ancestor.
Fig. 3
Fig. 3. Denser phylogenomic sequencing increases the power to detect selective constraints.
Results are shown from 3 alignments for 53 birds, 77 vertebrates, and 363 birds. a, Proportion of alignment columns labelled as conserved. The cumulative portion of the genome with a conserved call is shown, starting from the column with the smallest P value and proceeding to the columns with the highest P values. The dotted lines show the path after hitting the false-discovery rate (FDR) P value cutoff of 0.05, below which calls are significant (marked by arrows). b, Histograms of the rate of alignment columns evolving slower relative to the neutral rate (labelled 1.00). Coloured areas indicate significantly conserved columns, and light grey areas indicate non-significantly conserved columns. A rate of zero contains a relatively high proportion of recent insertions present in only a few species; there is limited statistical power to classify such insertions. c, Proportion of various functional regions of the chicken genome that contain single-bp conserved elements in the large alignment compared to alignments with fewer species. UTR, untranslated region. d, An example of a MAF::NFE2 motif overlaid on one of its predicted binding sites demonstrates the high resolution of our conserved site predictions and the increased power to predict conservation in the larger alignment.
Extended Data Fig. 1
Extended Data Fig. 1. Sampling and processing of the 363 genomes.
a, Sources of the 363 genomes. Each genome is a square; colour indicates the data source. Newly published genomes from the B10K Project phase II are red; unpublished genomes contributed by external labs are yellow; published genomes from phase I are orange; genomes contributed by the community that have since been published are dark blue; and other genomes available on NCBI are light blue. b, Map of geographical origin of the 281 bird samples for which geographical coordinates are available. c, Summary of the species confirmation of 236 B10K Project newly sequenced species. The downward arrows are excluded genomes. d, Summary of mitochondrial genome assembly and annotation for 336 species. The downward arrows are excluded mitochondrial genomes.
Extended Data Fig. 2
Extended Data Fig. 2. Distribution of transposable elements.
a, Percentage of the genome that is a transposable element (TE). Box plots are shown for groups with at least three sequenced species. b, Per cent base pairs of the genome that are long interspersed nuclear elements (LINEs), grouped by orders. Box plots are shown for groups with at least three sequenced species. c, S.d. of the transposable element content for orders with at least three sequenced species. d, S.d. of the per cent LINE content for orders with at least three sequenced species. e, Ancestral state reconstruction of total transposable elements. The branch colour from blue to red indicates an increase in transposable elements. Two orders with noticeable patterns—Piciformes and Bucerotiformes—are labelled on the tree. A zoomable figure with labels for all terminals is available at www.doi.org/10.17632/fnpwzj37gw.
Extended Data Fig. 3
Extended Data Fig. 3. Patterns of the presence and absence of 5 visual opsins in 363 bird species.
This figure shows patterns for the visual opsins encoded by RH1, RH2, OPN1sw1, OPN1sw2 and OPN1lw. Colours correspond to five annotated states of opsin sequences. A zoomable figure with labels for all terminals is available at www.doi.org/10.17632/fnpwzj37gw.
Extended Data Fig. 4
Extended Data Fig. 4. GC content and codon use.
a, Principal component analysis (PCA) of GC content in the coding regions of orthologues with conserved synteny with chicken for 340 bird species, including 164 Passeriformes species. b, Correspondence analysis of RSCU for all 363 birds. The primary and secondary axes account for 78.18% and 14.82% of the total variation, respectively. c, The distribution of codons on the same two axes as shown in b, with each codon coloured according to its ending nucleotide. This showed that the axis-1 score of a species is primarily determined by differences in frequencies of codons ending in G, C, A or T. d, RSCU analysis of 59 codons across avian genomes (n = 363 biologically independent species for each box plot). The horizontal lines indicate thresholds of under-represented codons (<0.6, blue box plots), average representation (1.0, white box plots) and over-represented codons (>1.6, orange box plots). e, Pearson correlation between GC content of the third codon position and the primary axis in b, colour-coded to distinguish Passeriformes and non-Passeriformes. The strong correlation (R2 = 0.9, P = 4.1 × 10−184) indicates that the frequencies of codons ending in G or C is the main driver of the codon bias in Passeriformes. f, Comparison of the mean Nc values between the Passeriformes and other species for orthologues with conserved synteny with chicken (Supplementary Table 12). Each dot represents the mean Nc value of an orthologue in the Passeriformes and other species, respectively. Orthologues with at least 20 individuals in both the Passeriformes and the non-Passeriformes were included in this analysis.
Extended Data Fig. 5
Extended Data Fig. 5. Overview of the pipelines for identifying genomic regions.
a, Assignment of orthologous protein-coding regions. All pairwise relationships between homologous regions obtained from the Cactus alignment (4 species shown here in different colours) were used to construct the homologous groups across all 363 birds. Using chicken as the reference, we further generated a table containing homologues with conserved synteny to chicken. b, Annotation of conserved orthologous intron regions on the basis of Cactus whole-genome alignments. The credible intron fragments in chicken were picked out after filtering out regions mapped by RNA sequences, and chicken-specific or repetitive regions. Orthologous relationships of intron fragments were detected on the basis of the aligned Cactus hits and the orthologues with conserved synteny with chicken. The non-intron regions of each bird in the alignments were masked as gaps.
Extended Data Fig. 6
Extended Data Fig. 6. Gene tree for copies of the growth hormone gene GH.
The tree was generated by maximum likelihood phylogenetic analysis of avian GH gene copies. Only nodes with >80 bootstrap are annotated as dots; the larger the dot, the higher the bootstrap. All Passeriformes sequences are clustered in a single clade and there are two sister gene clades within Passeriformes, corresponding to the GH_S gene copy (blue) and the GH_L gene copy (orange). Twelve species with only one copy are indicated by green stars. A zoomable figure with labels for all terminals and the tree file is available at www.doi.org/10.17632/fnpwzj37gw.
Extended Data Fig. 7
Extended Data Fig. 7. Identification of lineage-specific sequences.
a, An example of a 36-bp insertion (red) identified by Cactus in the southern cassowary (Casuarius casuarius) compared to the Okarito brown kiwi (Apteryx rowi) (both in Palaeognathae) with mapped sequence reads shown as lines. b, Proportion of lineage-specific sequence for each order correlated with the distance from parent node to MRCA node (branch length). c, Presence and absence of the DNAJC15-like gene (DNAJC15L), and its surrounding genes, in all 363 birds. Upstream: KLHL1 and DACH1; downstream: MZT1, BORA, RRP44, PIBF1 and KLF5. The state is shown for each bird in three ways: multiple copies (filled shapes), one copy (empty shapes) and no gene (blank). Passeriformes are highlighted in red. A zoomable figure with labels for all terminals is available at www.doi.org/10.17632/fnpwzj37gw. d, Exon fusion patterns of the DNAJC15-like gene (DNAJC15L) in three Passeriformes, compared to exon structure of the ancestral DNAJC15. For L. aspasia, gene models for the ancestral and novel copy are shown. The structure of the ancestral copy is highly conserved across all bird species with five introns. The Passeriformes-specific copy has no intron or newly derived minor intron and includes a poly-(A) at the 5′ end, which implies that this new gene was derived from retroduplication of DNAJC15.
Extended Data Fig. 8
Extended Data Fig. 8. The evolution of songbirds was associated with the loss of the cornulin gene.
a, Presence and absence of the cornulin gene (CRNN) and its surrounding genes (EDDM and S100A11) in all 363 birds. Branches are coloured as oscine Passeriformes (blue), non-oscine Passeriformes (green) and non-Passeriformes (black). The states of genes are shown in three ways: functional gene (filled box), pseudogene (empty box) and gene not found (blank). Genes were identified by Exonerate using phylogenetically diverse EDDM, CRNN and S100A11 sequences as queries. A zoomable figure with labels for all terminals is available at www.doi.org/10.17632/fnpwzj37gw. b, Hypothesis on the evolutionary loss of cornulin and the appearance of a fine-tuned extensibility of the oesophagus as a vocal tract filter in songbirds.
Extended Data Fig. 9
Extended Data Fig. 9. Acceleration and conservation scores.
Results are shown from 3 alignments for 53 birds, 77 vertebrates, and 363 birds. a, Acceleration (left) and conservation (right) within alignment columns on chicken. This panel is similar to Fig. 3a, but includes accelerated columns. b, Proportion of chicken functional regions covered by significantly accelerated or conserved sites. This panel is similar to Fig. 3c, but includes accelerated columns.
Extended Data Fig. 10
Extended Data Fig. 10. Distribution of acceleration and conservation scores.
a, Distribution of conservation and acceleration scores within different functional region types across alignments. Lines mark quartiles of the density estimates. b, Larger histogram of chicken column rates. This panel is similar to Fig. 3b, but includes accelerated columns ending at a rate of 10× the neutral rate. c, Difference in PhyloP scores (compared to original scores) after realignment with MAFFT for a random sample of significantly conserved sites. d, Comparison of the distribution of PhyloP scores across alignments. Scores indicate log-scaled probabilities of conservation (positive values) or acceleration (negative values) for each base in the genome. a and d show results from three alignments for 53 birds, 77 vertebrates and 363 birds.

Similar articles

Cited by

References

    1. Lewin HA, et al. Earth BioGenome project: sequencing life for the future of life. Proc. Natl Acad. Sci. USA. 2018;115:4325–4333. - PMC - PubMed
    1. Genome 10K Community of Scientists Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J. Hered. 2009;100:659–674. - PMC - PubMed
    1. i5K Consortium The i5K initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J. Hered. 2013;104:595–600. - PMC - PubMed
    1. Cheng S, et al. 10KP: a phylodiverse genome sequencing plan. Gigascience. 2018;7:1–9. - PMC - PubMed
    1. Prum RO, et al. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature. 2015;526:569–573. - PubMed

Publication types