Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2020 Jun 8;21(1):395.
doi: 10.1186/s12864-020-06787-5.

Refining the transcriptome of the human malaria parasite Plasmodium falciparum using amplification-free RNA-seq

Affiliations
Comparative Study

Refining the transcriptome of the human malaria parasite Plasmodium falciparum using amplification-free RNA-seq

Lia Chappell et al. BMC Genomics. .

Abstract

Background: Plasmodium parasites undergo several major developmental transitions during their complex lifecycle, which are enabled by precisely ordered gene expression programs. Transcriptomes from the 48-h blood stages of the major human malaria parasite Plasmodium falciparum have been described using cDNA microarrays and RNA-seq, but these assays have not always performed well within non-coding regions, where the AT-content is often 90-95%.

Results: We developed a directional, amplification-free RNA-seq protocol (DAFT-seq) to reduce bias against AT-rich cDNA, which we have applied to three strains of P. falciparum (3D7, HB3 and IT). While strain-specific differences were detected, overall there is strong conservation between the transcriptional profiles. For the 3D7 reference strain, transcription was detected from 89% of the genome, with over 78% of the genome transcribed into mRNAs. We also find that transcription from bidirectional promoters frequently results in non-coding, antisense transcripts. These datasets allowed us to refine the 5' and 3' untranslated regions (UTRs), which can be variable, long (> 1000 nt), and often overlap those of adjacent transcripts.

Conclusions: The approaches applied in this study allow a refined description of the transcriptional landscape of P. falciparum and demonstrate that very little of the densely packed P. falciparum genome is inactive or redundant. By capturing the 5' and 3' ends of mRNAs, we reveal both constant and dynamic use of transcriptional start sites across the intraerythrocytic developmental cycle that will be useful in guiding the definition of regulatory regions for use in future experimental gene expression studies.

PubMed Disclaimer

Conflict of interest statement

No authors declare any competing interests.

Figures

Fig. 1
Fig. 1
Most of the P. falciparum genome is transcribed. a Overview of DAFT-seq data for the 3D7 time course for all of chromosome 1. Top panel: DAFT-seq coverage for plus strand. Coloured traces represent normalised coverage for each of the seven time points analysed. Middle panel: DAFT-seq coverage for minus strand. Lower panel: annotated gene models for Pf3D7v3 from GeneDB. Legend: colours of coverage traces from each of the seven time points. b Continuous coverage of DAFT-seq data allows transcript boundaries to be redefined. Orange boxes define boundaries of transcripts on the plus strand and blue boxes define boundaries of transcripts on the minus strand. Colours of coverage traces from the seven time points are the same as those shown above. c Size of 3D7 5′ UTRs based on continuous coverage of DAFT-seq data. See supplementary information for details of the computational method. d Size of 3D7 3′ UTRs based on continuous coverage of DAFT-seq data. See supplementary information for details of the computational method. e Summary statistics to describe the extent of the genome that is transcribed. At least 78% of the genome can be transcribed into mRNA
Fig. 2
Fig. 2
Properties of transcription start sites (TSSs) and promoters. a Different library types show different properties of 5′ UTRs and TSSs for the gene encoding GAPDH (Pf3D7_1462800). DAFT-seq coverage (i) can be used to determine the longest possible 5′ UTR. Long read sequencing with PacBio (ii, iii) can be used to directly link a specific TSS with the rest of the transcript structure. Direct detection of TSSs with 5UTR-seq data (iv) reveals a range of different TSSs, which have different prevalences at different time points (v- vii). The first track (i) illustrates 7 DAFT-seq libraries, showing continuous coverage along the length of the gene, and variable steady state levels of mRNA throughout the time course. The next two tracks show PacBio coverage (ii) and reads (iii); these long reads can link variation in the TSS to the structure of the rest of the transcript. The fourth track (iv) shows the extreme 5′ end of mRNAs detected with all of the 5UTR-seq data. This data can be separated by time point (track v), with examination of individual time points showing that the most common TSS early in the time course (track vi) is further upstream from the coding sequence than the most common TSSs later in the time course (track vii). b Genomic locations of the TSS peaks identified using 5UTR-seq data. The vast majority of the TSS peaks in this data set (90%) fell outside of annotated exons and introns. A small proportion (9%) were within exons, while 1% were within introns. c Patterns in the base composition around TSSs were identified using the precise TSS positions inferred from the 5UTR-seq data. Windows are shown for a 20 nt distance (i) and a 100 nt distance (ii). Calculation of the information content of the base composition for a 1000 nt window shows that it peaks around the inferred TSS. d Number of TSS peaks in broad or sharp categories for each of the seven time points in the 3D7 time course
Fig. 3
Fig. 3
Correlations between expression patterns of adjacent mRNA transcripts. a Schematic diagram of gene pairs in a “head-to-head” orientation (also known as divergent gene pairs). The black arrows represent the direction of transcription and the dark grey box between the genes represents the intervening genomic sequence that is between the longest detection version of both 5′ UTRs. b Correlation of gene expression (in TPM, using Pearson correlation) for 1119 pairs of head-to-head genes with annotated 5′ UTRs plotted against the distance of the intervening genomic sequence. The median intervening sequence length is 548 bp (without annotation of 5′ UTRs this distance was 1946 bp). The median correlation of expression was 0.49, with the distribution showing a positive skew. An individual region (blue) is shown in more detail in panel C. c Expression profiles through the 3D7 time course for a head-to-head gene pair (Pf3D7_1011900, heme oxygenase and Pf3D7_1012000, RING zinc finger protein, putative). The steady state levels of these two genes is tightly correlated, with an R value of 0.96 (measured by spearman correlation). d Schematic diagram of gene pairs in a “tail-to-tail” orientation (also known as convergent gene pairs). The black arrows represent the direction of transcription and the dark grey box between the genes represents the genomic sequence that is between the longest detection version of both 3′ UTRs. e Correlation of gene expression (in TPM, using Pearson) for 1059 pairs of tail-to-tail genes with annotated 5′ UTRs plotted against the distance of the intervening genomic sequence. The median intervening sequence length is − 124 bp, i.e. most 3′ UTR pairs overlap (without annotation of 3′ UTRs this distance is + 657 bp). The distribution of correlation values includes both strongly negative and strongly positive relationships. An individual region (blue) is shown in more detail in panel f. f Expression profiles through the 3D7 time course for a tail-to-tail gene pair (Pf3D7_1115900, DHHC9 and Pf3D7_1116000, RON4). Despite an overlap of 327 nt in the 3′ UTRs the steady state level of these genes is strongly correlated, with a spearman correlation of value 0.81. g Correlation of expression profiles (Spearman) of 1000 neighbouring gene pairs for head-to-head, tail-to-tail and randomly selected gene pairs. The 1000 random neighboring gene pairs were randomly sampled 1000 times from all annotated head-to-head, tail-to-tail and tandem gene pairs. Mean correlations were 0.35, 0.10, and 0.18 for head-to-head, tail-to-tail and random orientations, respectively. The Wilcoxon rank sum test was used to determine significance between groups. P-values of 2.2e-16 and 3.8e-3 when comparing head-to-head and tail-to-tail groups to random pairings, respectively. h Correlation of expression profiles (Spearman) neighbouring gene pairs for head-to-head gene pairs, binned by intervening genomic distance. i Correlation of expression profiles (Spearman) neighbouring gene pairs for tail-to-tail gene pairs, binned by intervening genomic distance
Fig. 4
Fig. 4
Non-coding RNAs that may share promoters with nearby mRNAs. a Schematic showing the relative orientations of ncRNAs that may share a promoter sequence with an adjacent mRNA. Black arrows show the direction of transcription. The pair of transcripts may share a bidirectional promoter. b Example of a ncRNA and an mRNA (Pf3D7_1408200, which encodes an ApiAP2 protein) that may share a bidirectional promoter. The transcript pair is shown in the same orientation as the diagram in the top panel. The expression profiles of the two transcripts are correlated through the 3D7 time course
Fig. 5
Fig. 5
Novel and updated features within P. falciparum genes. a Splice sites detected in the 3D7 time course using DAFT-seq data. A total of 8206 splice sites were supported by at least 5 reads; of these, 7386 were in online databases. There are 1377 splice sites not described in online databases (17% of the total), including 659 in UTRs. b Length of introns found in this study, using the DAFT-seq data for the 3D7 time course. c Schematic diagram of “exitrons” (protein-coding introns) modified from Marquez et al. [54]. An exitron may be retained in an mRNA, generating a fully coding mRNA sequence (left). If the excised sequence is a multiple of 3 nt the open reading frame is maintained and a full-length protein may be produced (top right). If the excised sequence is not a multiple of 3 nt this may generate a change to the C-terminus (middle left) or cause the transcript to be degraded through the nonsense-mediated decay pathway (bottom right). d Exitrons are present in the 3D7 transcriptome. There are examples of mRNAs where a multiple of 3 nt are excised, such as Pf3D7_0420300 (which encodes an ApiAP2 protein). Here ~ 90% of the reads support the “exitron-out” form, with ~ 10% of the reads supporting the “exitron-in” form. e Exitrons are present in the 3D7 transcriptome. There are examples of mRNAs where even numbers of nt are excised, such as Pf3D7_1417200 (which includes a putative NOT family protein). For this gene ~ 80% of the reads support the exitron-out form, with ~ 20% of the reads supporting the “exitron-in” form
Fig. 6
Fig. 6
Comparison of blood stage transcriptomes from the 3D7, HB3, and IT strains of P. falciparum.a Number of transcripts with expression detected in each of the three P. falciparum strains. Detection was based on a minimum expression of 5 TPM for at least one time point within the time course. Most transcripts (78%) were detected in all three strains. b Number of transcripts with no expression detected in each of the three P. falciparum strains. Detection was based on a minimum expression of 5 TPM for at least one time point within the time course. c Expression profile of the FIKK3 gene, which shows similar timing/phase of expression in each of the three strains. d Expression profile of the ACS9 gene, which shows a different timing/phase of expression in the 3D7 strain compared to both HB3 and IT. e Identification of genes that are differentially expressed between the 3D7 strain and the HB3 strain. Coloured data points highlight genes with at least a 2 fold difference in expression and FDR < = 0.05. f Identification of genes that are differentially expressed between the 3D7 strain and the IT strain. g Identification of genes that are differentially expressed between the HB3 strain and the IT strain. h Phase (timing) differences in genes between the 3D7 strain and the HB3 strain. The mean phase difference was 0.747 rad (5.7 h). i Phase differences between the 3D7 strain and the IT strain. The mean phase difference was 0.697 rad (5.3 h). j Phase differences between the HB3 strain and the IT strain. The mean phase difference was 0.304 rad (2.3 h)

Similar articles

Cited by

References

    1. World Health Organization . World Malaria Report 2018. Geneva: World Health Organization; 2018.
    1. Cowman AF, Tonkin CJ, Tham W-H, Duraisingh MT. The molecular basis of erythrocyte invasion by malaria parasites. Cell Host Microbe. 2017;22:232–245. - PubMed
    1. Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature. 2002;419:498–511. - PMC - PubMed
    1. Bozdech Z, Llinás M, Pulliam BL, Wong ED, Zhu J, DeRisi JL. The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol. 2003;1:E5. - PMC - PubMed
    1. Le Roch KG, Zhou Y, Blair PL, Grainger M, Moch JK, Haynes JD, et al. Discovery of gene function by expression profiling of the malaria parasite life cycle. Science. 2003;301:1503–1508. - PubMed

Publication types

MeSH terms