Diatoms (Bacillariophyta) are a species-rich group of eukaryotic microbes diverse in morphology, ecology, and metabolism. Previous reconstructions of the diatom phylogeny based on one or a few genes have resulted in inconsistent resolution or low support for critical nodes. We applied phylogenetic paralog pruning techniques to a data set of 94 diatom genomes and transcriptomes to infer perennially difficult species relationships, using concatenation and summary-coalescent methods to reconstruct species trees from data sets spanning a wide range of thresholds for taxon and column occupancy in gene alignments. Conflicts between gene and species trees decreased with both increasing taxon occupancy and bootstrap cutoffs applied to gene trees. Concordance between gene and species trees was lowest for short internodes and increased logarithmically with increasing edge length, suggesting that incomplete lineage sorting disproportionately affects species tree inference at short internodes, which are a common feature of the diatom phylogeny. Although species tree topologies were largely consistent across many data treatments, concatenation methods appeared to outperform summary-coalescent methods for sparse alignments. Our results underscore that approaches to species-tree inference based on few loci are likely to be misled by unrepresentative sampling of gene histories, particularly in lineages that may have diversified rapidly. In addition, phylogenomic studies of diatoms, and potentially other hyperdiverse groups, should maximize the number of gene trees with high taxon occupancy, though there is clearly a limit to how many of these genes will be available.
Keywords: Bacillariophyta; diatoms; incomplete lineage sorting; phylogenomics; phylotranscriptomics.
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.