Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan;32(1):e4519.
doi: 10.1002/pro.4519.

DALI shines a light on remote homologs: One hundred discoveries

Affiliations

DALI shines a light on remote homologs: One hundred discoveries

Liisa Holm et al. Protein Sci. 2023 Jan.

Abstract

Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server (http://ekhidna2.biocenter.helsinki.fi/dali) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.

Keywords: AlphaFold Database; evolutionary classification; homology transfer of protein function; structural alignment.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Structural coverage of Pfam 35.0 families by AlphaFold Database version 1.
FIGURE 2
FIGURE 2
PF11904 joins the LolA/B superfamily (CL0048). (a) PF11904 representative AF‐Q7K4M9‐F1‐model_v1/118‐496, structurally aligned segments are shown in blue. Shown in green are large insertions in the GPCR‐chaperone domain, which are modeled with lower confidence. (b) VioE family representative 2zf4F in the same orientation, showing a bound lipid molecule above the flat beta sheet of the common core. (c) Pfam domains mapped to structural alignment. Note that structural equivalence extends to the left of Pfam domain boundary on the query. (d) Stacked sequence logos. The RxD motif contacts the bound lipid in the VioE family. The RxD motif appears specific to the VioE family and PF11904.
FIGURE 3
FIGURE 3
Papain‐like conserved active site. The AlphaFold model for a representative of PF08795 is colored by sequence conservation versus 2g6tA. His and Cys of the papain‐like catalytic dyad (shown in dark blue) are invariantly conserved in both protein families.
FIGURE 4
FIGURE 4
Fusion of receptor and sensor domains in PF09909. (a) PF09909 model. Structurally aligned regions are shown in green. (b) Receptor and sensor domain matches in the same orientation and colored darker or lighter orange, respectively. Reproduce the superposition by launching a DALI pairwise alignment of eszA as first structure, 3k8hA and 4esqA as second structures.
FIGURE 5
FIGURE 5
Fraction of nopdb families with hhblits e‐value <1 and DALI Z‐score ratio >0.8 (blue) or not meeting the criteria (orange).

Similar articles

Cited by

References

    1. Abdi H. The Bonferroni and Sidak corrections for multiple comparisons. In: Salkind NJ, editor. Encyclopedia of measurement and statistics. Thousand Oaks: Sage; 2007.
    1. Aderinwale T, Bharadwaj V, Christoffer C, Terashi G, Zhang Z, Jahandideh R, et al. Real‐time structure search and structure classification for AlphaFold protein models. Commun Biol. 2022;5:316. 10.1038/s42003-022-03261-8 - DOI - PMC - PubMed
    1. Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, et al. OMA orthology in 2021: Website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res. 2021;49:D373–9. - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–42. - PMC - PubMed
    1. Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49:D344–54. - PMC - PubMed

Publication types

LinkOut - more resources