Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov;53(11):1606-1615.
doi: 10.1038/s41588-021-00955-3. Epub 2021 Nov 4.

Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus

Affiliations

Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus

Damien J Downes et al. Nat Genet. 2021 Nov.

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) disease (COVID-19) pandemic has caused millions of deaths worldwide. Genome-wide association studies identified the 3p21.31 region as conferring a twofold increased risk of respiratory failure. Here, using a combined multiomics and machine learning approach, we identify the gain-of-function risk A allele of an SNP, rs17713054G>A, as a probable causative variant. We show with chromosome conformation capture and gene-expression analysis that the rs17713054-affected enhancer upregulates the interacting gene, leucine zipper transcription factor like 1 (LZTFL1). Selective spatial transcriptomic analysis of lung biopsies from patients with COVID-19 shows the presence of signals associated with epithelial-mesenchymal transition (EMT), a viral response pathway that is regulated by LZTFL1. We conclude that pulmonary epithelial cells undergoing EMT, rather than immune cells, are likely responsible for the 3p21.31-associated risk. Since the 3p21.31 effect is conferred by a gain-of-function, LZTFL1 may represent a therapeutic target.

PubMed Disclaimer

Conflict of interest statement

Competing Interests Statement

J.R.H. and J.O.J.D. are founders and shareholders of, and J.R.H., J.O.J.D., D.J.D. and R.S. are paid consultants for Nucleome Therapeutics. J.R.H and J.O.J.D. hold patents for Capture-C (WO2017068379A1, EP3365464B1, US10934578B2) and have a patent application for MCC. J.A.T. is member of the GSK Human Genetics Advisory Board. These authors declare no other financial or non-financial interests. The remaining authors declare no competing interests.

Figures

Extended Data Figure 1
Extended Data Figure 1. 3p21.31 severe COVID-19 locus SNPs are not in immune regulatory elements.
a, To decode GWAS variants either all genome wide significant variants and/or variants in linkage disequilibrium with sentinel variants are assessed for protein coding changes with ANNOVAR. Remaining variants are then assessed for changes in splicing of expressed genes using the SpliceAI machine learning approach or splicing quantitative trait loci (sQTL). Variants are then intersected with open chromatin with a panel of disease relevant cell types to asses cis-regulatory element altering potential. This potential is assessed for effects on open chromatin with deepHaem or transcription factor binding with both deepHaem and Sasquatch. Finally, variants in enhancers are linked to target effector genes using high resolution chromosome conformation capture with NG/NuTi Capture-C, or Micro Capture-C. b, Heatmap of linkage disequilibrium (European; EUR) between a severe COVID-19 lead SNP (rs11385942) with lead SNPs for other GWAS traits identified in the region (chr3:45,710,500-45,954-500, hg38). c, Linkage analysis for a 3p21.31 severe COVID-19 lead SNP (rs11385942 - circle) showing variants within 100 kb and r2>0.2. No variants with r2>0.6 were seen beyond this range. d, Overlaid tracks of ATAC-seq from sorted populations of resting (blue) and stimulated (red) immune cells. Overlapping signal appears black. Abbreviations: Memory (Mem.), Immature (Imm.), Mature (Mat.), Natural Killer cells (NK), Plasmacytoid Dendritic cells (pDC), Myeloid Dendritic cells (mDC), Monocytes (Mono.), Effector (Eff.), Helper (H.), Regulatory (Reg.), and Central (C.). Region: chr3:45,800,000-45,870,000, hg38.
Extended Data Figure 2
Extended Data Figure 2. DNase I accessibility over COVID-19 SNPs.
a. DNase I signal in each of 95 ENCODE datasets for rs17713054 (chr3:45,817,661-45,818,660, hg38) and rs7634459 (chr3:45,859,001-45,859,500, hg38) which were found in open chromatin. Datasets are grouped according to cell-type, numbers indicate tissue of origin (see panel c). Violin plots of ENCODE DNase I accessibility over rs17713054 grouped by cell type (b) and tissue of origin (c). Each sample is shown as a red dot, dashed lines show mean, dotted lines show quartiles.
Extended Data Figure 3
Extended Data Figure 3. deepHaem prediction of de novo open chromatin elements.
deepHaem negative damage score, which predict gain-of-accessibility, for the 28 candidate COVID-19 severity variants in 694 cell-types. Positive scores (loss-of-function) were adjusted to zero. In general, variants generating de novo regulatory elements have scores lower than – 0.1, which was not true for any variant in any cell type.
Extended Data Figure 4
Extended Data Figure 4. rs76374459 is likely benign in an erythroid enhancer.
ATAC-seq from progenitor and differentiating erythroid cells. Haematopoietic Stem Cells (HSC), Multi-Potent Progenitors (MPP), Common Myeloid Progenitors (CMP), Myeloid-Erythroid Progenitors (MEP) from bone marrow or peripheral blood and erythroid Colony Forming Units (CFU-E), Pro-erythroblasts (ProE1, ProE2), Basophilic Erythroblasts (BasoE), Polychromatic Erythroblasts (PolyE), Orthochromatic Erythroblasts (OrthoE) and Orthochromatic/Reticulocytes (OrthoRet). ChIP-seq tracks from CD71+ CD23+ mature erythroid cells show presence of marks associated with active transcription (H3K27ac), enhancers (H3K4me1), promoters (H3K4me3) and boundaries (CTCF). b, deepHaem damage score for the risk-C allele versus non-risk-G allele of rs76374459 associated with severe COVID-19 in 694 cell-types. rs763774458 is found in open chromatin through-out erythropoiesis. A positive score predicts loss of accessibility, a negative score predicts increased accessibility.
Extended Data Figure 5
Extended Data Figure 5. Single nucleus ATAC-seq in adult lung.
Chromium single nucleus ATAC-seq from non-diseased adult lung (n=3) with 17 epithelial, endothelial, mesenchymal and hematopoietic populations, including Alveolar Type (AT) 1 and 2 Pneumocytes, Macrophage (MΦ) and Natural Killer (NK) cells. The rs17713054 containing element is highlighted in grey.
Extended Data Figure 6
Extended Data Figure 6. Pulmonary expression and binding analysis of CEBPB.
a, GTEx top five expressed tissues for CEBPB. For violin plots, minima and maxima are the top and bottom of the violin, black lines show means, ends of the pale regions denote first and third quartiles, and black dots denote outliers. Data from independent samples for Whole blood (n=755), Lung (n=578) Adipose (n=541), Fallopian Tube (n=9), Artery (n=663). b, Chromium single nucleus RNA-seq from non-diseased adult lung (n=3 independent samples) with 22 epithelial, endothelial and mesenchymal populations, including Alveolar Type (AT) 1 and 2 Pneumocytes and Pulmonary Neuroendocrine cells (PNECs). c, 10x Genomics Chromium droplet single-cell RNA sequencing (scRNA-seq) from upper and lower airways and lung parenchyma from healthy volunteers or deceased transplant donors with ten epithelial populations (i) with expression profiles for CEBPB (ii). d, ENCODE ChIP-seq for CEBPB in A549 alveolar basal epithelial adenocarcinoma cells, HeLa cells, and IMR-90 lung fibroblast cells with inset region (chr3:45,805,000-45,855,000; hg38) showing the rs17713054 containing enhancer. e, DeepHeam ChIP-seq binding prediction score for CEBPB in lung fibroblast (IMR-90), alveolar basal epithelial adenocarcinoma (A549), the erythroleukaemia line (K562), human endothelial kidney cells (HEK293), and the GM12878 lymphoblastoid cell line (LCL) predicts increased binding to the risk-A allele.
Extended Data Figure 7
Extended Data Figure 7. LZTFL1 is a most likely target of rs17713054.
a, NuTi Capture-C and Micro Capture-C from the rs17713054 enhancer in Endothelial cells (HUVEC) shows specific interaction with only the promoter of LZTFL1 and an upstream CTCF site (triangles). CTCF track shows binding of the CCAAT-binding factor which acts as a boundary. b, ENCODE ChIP-seq for the active chromatin mark (H3K27ac), the repressive chromatin mark (H3K27me3) and EZH2, a member of the Polycomb Repressive Complex 2, in endothelial (HUVEC) and normal human lung fibroblast (NHLF) cells. Green bar denotes the 3C regulatory domain as identified by 3C analysis. c, ENCODE DNase I seq tracks from a range of cell types and tissues, including airway epithelium and bronchial epithelium, where the rs17713054 enhancer is active. In these cell types the LZTFL1 promoter is DNase I accessible, but neither the CCR9 promoter nor the SLC6A20 promoter are. Region shown is chr3: 45,730,000-45,930,000 (hg38). d, Paired accessibility analysis of read counts per kilobase (RPK) over the LZTFL1 and SLC6A20 promoters and the rs17713054 enhancer in 156 ENCODE, immune and erythroid open chromatin datasets. Only the LZTFL1 promoter is widely accessible in the same cells as the affected enhancer.
Extended Data Figure 8
Extended Data Figure 8. Expression and eQTL analysis of 3p21.31 candidate lung effector genes.
a, Genomic position of genes identified as 3p21.31 candidate causal genes with method of identification, including two TWASs,. b, GTEx whole lung RNA-seq expression profiles for candidate causal genes as transcripts per million (TPM) with rs17713054 eQTL two-sided p-value for lung. For violin plots, minima and maxima are the top and bottom of the violin, black lines show means, ends of the pale regions denote first and third quartiles, and black dots denote outliers. n=578 independent samples. c, Chromium single nucleus RNA-seq from non-diseased adult lung (n=3), including Alveolar Type 1 (AT) and Type 2 (AT2) Pneumocytes and Pulmonary Neuroendocrine cells (PNECs).
Extended Data Figure 9
Extended Data Figure 9. CRISPR/Cas9 deletion of the rs17713054 enhancer.
a, ENCODE DNase I-seq in. HUVEC and IMR-90 cells and ATAC-seq in Blood Outgrowth Endothelial Cells (BOECs) and H441 epithelial cells showing the rs17713054 containing enhancer with schematic of generated deletions and short guide RNA (sgRNA) binding sites. b, Example D1000 trace of genotyping PCR product amplified from cells transfected with Cas9 protein only, Cas9 protein with sgRNA1+2 (Δ108), or Cas9 protein with sgRNA1+3 (Δ191). c, Example Sanger sequencing trace following ICE analysis over the sgRNA1 and sgRNA2 binding sites in unedited cells, and the double strand break repair site in cells containing the 108 bp deletions. sgRNA sequence shown by black boxes, PAM sites shown with red letters. d, Calculated deletion efficiency for each sgRNA pair and cell type. Transfections failing to achieve >70% deletion (blue circles) were excluded from expression analyses. n shown are for independent transfections e, Expression of LZTFL1 normalized to RPS18 and expressed as relative to the mean expression in Cas9 only treated cells for each cell type. Corrected p-values from an ordinary one-way ANOVA with Dunnett’s multiple comparisons test. n shown are for independent samples from at least 3 independent transfections. For d,e bars show mean and one standard deviation. f, ChIP-seq for the active transcription marker (H3K27ac) was performed in umbilical vein endothelial cells (HUVECs), blood outgrowth endothelial cells (BOECs), H441 lung epithelial cells, and IMR-90 lung fibroblast cells. The rs17713054 enhancer (grey box, g) lacks strong modification under standard growth conditions in these cells.
Extended Data Figure 10
Extended Data Figure 10. COVID-19 patient lung shows signals of EMT.
Spearman correlation of gene expression profiles for EMT-related genes with the cell-types identified by deconvolution. AT1: Alveolar Type1 pneumocytes, AT2: Alveolar Type2 pneumocytes. P-values were identified by two-sided Hmisc analysis (without multiple test correction), values for significant correlations are shown and all correlation and p-values are in Source Data.
Figure 1
Figure 1. Identification of a potentially causative COVID-19 risk variant.
COVID-19 risk variants from GWAS were assessed for multiple mechanisms. All genome-wide significant variants and linked variants are shown (GWAS) as are variants present in the Vindija Neanderthal risk haplotype. Circles indicate variants assessed for splicing changes (blue circles, SpliceAI: ΔS score [0-1, where 1 is most damaging]), and presence in cis-regulatory elements using open chromatin in 95 ENCODE overlaid DNase I datasets (red circles), normal human bronchial epithelial cells (NHBE), and single-cell ATAC-seq from fetal ciliated epithelium and alveolar epithelium. Histone H3 modification tracks show presence of marks associated with active transcription (H3K27ac) at enhancers (H3K4me1) and promoters (H3K4me3). Variants in open chromatin are given deepHaem damage scores (DH, 0-1) with sign indicating increased (-) or decreased (+) accessibility. Region shown is chr3:45,800,000-45,870,000, hg38.
Figure 2
Figure 2. rs17713054 creates a CEBPB motif.
a, Ranked deepHaem chromatin accessibility damage scores for the risk A allele of rs17713054 in 694 cell-types including primary cells. Line plot shows cumulative percentage of samples for each tissue, indication that lung tissue is enriched in the highly ranked damaging variants. b, Quantification of ATAC-seq reads in the rs17713054 enhancer (chr3:45,817,661-45,818,660, hg38) from aortic endothelium. Bars show mean and one standard deviation. Two-tailed Mann-Whitney rank sum test, testing different accessibility of the two genotypes, G/G n = 78 and G/A n = 8 independent experiments. c, ATAC-seq reads over rs17713054 alleles in heterozygous individuals, grey lines denote paired counts from a single replicate. One-sided Wilcoxon matched-pairs signed rank test, testing higher accessibility of the A allele, n = 5. Three replicates were excluded due to low coverage. d, CEBPB DNA binding motif over sequence around the rs17713054 risk-A and non-risk-G alleles. P values for motifs were determined using FIMO with reference and variant sequence for the entire enhancer and Jaspar motif MA0466.1. The motif over rs17713054 was only identified in sequence with the A allele. e, Sasquatch DNase I hypersensitivity profile and shoulder-footprint ratio (sfr) scores for rs17713054 risk and non-risk (ref-G) alleles using DNase I datasets for a subset of cells with open chromatin at this site. Larger sfr scores indicate a deeper footprint associated with greater likelihood of being bound by a transcription factor. Δsfr scores are generated by subtracting risk-A sfr from ref-G sfr, negative values show an increased footprint depth in the risk allele.
Figure 3
Figure 3. The interaction landscape of the severe COVID-19 risk locus.
a, DpnII Capture-C derived mean interaction count (n = 3 for all except CD14+: n = 2) and one standard deviation (shading) for gene promoters in human vein endothelial cells (HUVEC), resting and activated T-Cells (CD4+ Non-Act/Act), monocytes (CD14+), CD235+ CD71+ erythroid cells and human embryonic stem cells (H1-hESCs). The enhancer containing rs17713054 is highlighted by a grey box. ATAC-seq/DNase I for each cell-type is shown underneath in black. CTCF track shows binding of the CCAAT-binding factor which acts as a boundary with forward and reverse motif orientation shown with arrowheads (red and blue respectively). Three broad regulatory domains were identified as regions with overlapping interactions. Region: chr3:45,400,000-46,200,000, hg38. Per fragment interactions were smoothed using 400-bp bins and an 8-kb window. b, The rs17713054 regulatory domain in endothelial cells (HUVEC). Overlaid DNase I shows accessible sites in 95 cell types and H3K27ac shows active elements. Region: chr3:45,730,000-45,930,000, hg38. Per fragment interactions were smoothed using 250-bp bins and a 5-kb window. Solid line shows mean interaction count (n = 3 independent samples) with one standard deviation (shading). c, Micro Capture-C (MCC) of the rs17713054 enhancer in endothelial (HUVEC, blue) and erythroid (HUDEP-2, red) cells with tissue specific open chromatin tracks (n = 3). Peak analysis of MCC using LanceOtron to compare HUVEC and HUDEP-2 profiles identified two significantly enriched peaks in HUVEC cells (black triangles, P ≤ 1 × 10-999) which correspond to the LZTFL1 promoter and the upstream CTCF site.
Figure 4
Figure 4. Pulmonary expression analysis of LZTFL1 and SLC6A20.
a, GTEx whole-lung RNA-seq expression profiles for LZTFL1 and SLC6A20 as transcripts per million (TPM). For violin plots, minima and maxima are the top and bottom of the violin, black lines show means, ends of the pale regions denote first and third quartiles, and black dots denote outliers (n = 578 independent samples). b, 10x Genomics Chromium droplet single-cell RNA sequencing (scRNA-seq) from upper and lower airways and lung parenchyma from healthy volunteers or deceased transplant donors with ten epithelial populations (i). scRNA-seq expression profiles for LZTFL1 (ii) and SLC6A20 (iii). c, Chromium single-nucleus RNA-seq from non-diseased adult lung (n = 3) with 22 epithelial, endothelial and mesenchymal populations, including alveolar type 1 (AT) and type 2 (AT2) pneumocytes and pulmonary neuroendocrine cells (PNECs). d, GTEx eQTL analysis the rs17713054 risk-A allele in lung (n = 515 independent samples). Normalized effect size (NES) is the slope of the linear regression comparing the alternate (A) allele to the reference (G) allele. NES are calculated in a normalized space where magnitude has no direct biological interpretation. Lines show the 95% confidence interval, with significance values for single tissue (two-sided P value without multiple test correction) and multi-tissue (posterior probability/m-value) analyses.
Figure 5
Figure 5. COVID-19 patient lungs show signals of EMT.
Hematoxylin and eosin (H&E) stained biopsies of the ciliated respiratory epithelium on bronchiole (a) and of alveolar space (b) in healthy lung (i) and COVID-19 patient lung (ii-iv). COVID-19 patient samples are representative images from staining of biopsies from 3 individuals and show loss of ciliated cell lined bronchioles (denudation) and loss of alveolar monolayers populated by alveolar type I pneumocytes with few type II pneumocytes, with alveolar wall expansion and fine interstitial fibrosis. Scale bars show 50 μM. c, Spearman correlation of gene expression profiles for EMT-related genes with the eigengenes of cell-type modules identified by WGCNA analysis from spatially resolved expression data from COVID-19 patient lung. P values were identified by two-sided Hmisc analysis (without multiple test correction), values for significant correlations (P < 0.05) are shown and all correlation and P values are in Source Data.

Similar articles

  • CRISPRi links COVID-19 GWAS loci to LZTFL1 and RAVER1.
    Fink-Baldauf IM, Stuart WD, Brewington JJ, Guo M, Maeda Y. Fink-Baldauf IM, et al. EBioMedicine. 2022 Jan;75:103806. doi: 10.1016/j.ebiom.2021.103806. Epub 2022 Jan 6. EBioMedicine. 2022. PMID: 34998241 Free PMC article.
  • Genomewide Association Study of Severe Covid-19 with Respiratory Failure.
    Severe Covid-19 GWAS Group; Ellinghaus D, Degenhardt F, Bujanda L, Buti M, Albillos A, Invernizzi P, Fernández J, Prati D, Baselli G, Asselta R, Grimsrud MM, Milani C, Aziz F, Kässens J, May S, Wendorff M, Wienbrandt L, Uellendahl-Werth F, Zheng T, Yi X, de Pablo R, Chercoles AG, Palom A, Garcia-Fernandez AE, Rodriguez-Frias F, Zanella A, Bandera A, Protti A, Aghemo A, Lleo A, Biondi A, Caballero-Garralda A, Gori A, Tanck A, Carreras Nolla A, Latiano A, Fracanzani AL, Peschuck A, Julià A, Pesenti A, Voza A, Jiménez D, Mateos B, Nafria Jimenez B, Quereda C, Paccapelo C, Gassner C, Angelini C, Cea C, Solier A, Pestaña D, Muñiz-Diaz E, Sandoval E, Paraboschi EM, Navas E, García Sánchez F, Ceriotti F, Martinelli-Boneschi F, Peyvandi F, Blasi F, Téllez L, Blanco-Grau A, Hemmrich-Stanisak G, Grasselli G, Costantino G, Cardamone G, Foti G, Aneli S, Kurihara H, ElAbd H, My I, Galván-Femenia I, Martín J, Erdmann J, Ferrusquía-Acosta J, Garcia-Etxebarria K, Izquierdo-Sanchez L, Bettini LR, Sumoy L, Terranova L, Moreira L, Santoro L, Scudeller L, Mesonero F, Roade L, Rühlemann MC, Schaefer M, Carrabba M, Riveiro-Barciela M, Figuera Basso ME, Valsecchi MG, Hernandez-Tejero M, Acosta-Herrera M… See abstract for full author list ➔ Severe Covid-19 GWAS Group, et al. N Engl J Med. 2020 Oct 15;383(16):1522-1534. doi: 10.1056/NEJMoa2020283. Epub 2020 Jun 17. N Engl J Med. 2020. PMID: 32558485 Free PMC article.
  • LZTFL1 suppresses lung tumorigenesis by maintaining differentiation of lung epithelial cells.
    Wei Q, Chen ZH, Wang L, Zhang T, Duan L, Behrens C, Wistuba II, Minna JD, Gao B, Luo JH, Liu ZP. Wei Q, et al. Oncogene. 2016 May 19;35(20):2655-63. doi: 10.1038/onc.2015.328. Epub 2015 Sep 14. Oncogene. 2016. PMID: 26364604 Free PMC article.
  • Cytokines as drivers: Unraveling the mechanisms of epithelial-mesenchymal transition in COVID-19 lung fibrosis.
    Zhang L, Zhang X, Deng X, Wang P, Mo Y, Zhang Y, Tong X. Zhang L, et al. Biochem Biophys Res Commun. 2023 Dec 17;686:149118. doi: 10.1016/j.bbrc.2023.10.050. Epub 2023 Oct 14. Biochem Biophys Res Commun. 2023. PMID: 37931361 Review.
  • Mesenchymal stem cell immunomodulation and regeneration therapeutics as an ameliorative approach for COVID-19 pandemics.
    Yadav P, Vats R, Bano A, Bhardwaj R. Yadav P, et al. Life Sci. 2020 Dec 15;263:118588. doi: 10.1016/j.lfs.2020.118588. Epub 2020 Oct 10. Life Sci. 2020. PMID: 33049279 Free PMC article. Review.

Cited by

References

    1. Zhu N, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382:727–733. - PMC - PubMed
    1. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20:533–534. - PMC - PubMed
    1. Marini JJ, Hotchkiss JR, Broccard AF. Bench-to-bedside review: Microvascular and airspace linkage in ventilator-induced lung injury. J Am Med Assoc. 2020;323:2330 - PMC - PubMed
    1. Levi M, Thachil J, Iba T, Levy JH. Coagulation abnormalities and thrombosis in patients with COVID- 19. Lancet Haematol. 2020;7:e438–e440. - PMC - PubMed
    1. Varga Z, et al. Endothelial cell infection and endotheliitis in COVID-19. Lancet. 2020;395:1417–1418. - PMC - PubMed

Publication types

MeSH terms

Substances