Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 9;49(12):e67.
doi: 10.1093/nar/gkab199.

Kinetic sequencing (k-Seq) as a massively parallel assay for ribozyme kinetics: utility and critical parameters

Affiliations

Kinetic sequencing (k-Seq) as a massively parallel assay for ribozyme kinetics: utility and critical parameters

Yuning Shen et al. Nucleic Acids Res. .

Abstract

Characterizing genotype-phenotype relationships of biomolecules (e.g. ribozymes) requires accurate ways to measure activity for a large set of molecules. Kinetic measurement using high-throughput sequencing (e.g. k-Seq) is an emerging assay applicable in various domains that potentially scales up measurement throughput to over 106 unique nucleic acid sequences. However, maximizing the return of such assays requires understanding the technical challenges introduced by sequence heterogeneity and DNA sequencing. We characterized the k-Seq method in terms of model identifiability, effects of sequencing error, accuracy and precision using simulated datasets and experimental data from a variant pool constructed from previously identified ribozymes. Relative abundance, kinetic coefficients, and measurement noise were found to affect the measurement of each sequence. We introduced bootstrapping to robustly quantify the uncertainty in estimating model parameters and proposed interpretable metrics to quantify model identifiability. These efforts enabled the rigorous reporting of data quality for individual sequences in k-Seq experiments. Here we present detailed protocols, define critical experimental factors, and identify general guidelines to maximize the number of sequences and their measurement accuracy from k-Seq data. Analogous practices could be applied to improve the rigor of other sequencing-based assays.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
General scheme of k-Seq experiment and analysis. A heterogeneous input pool containing nucleic acids is reacted at different experimental conditions (e.g. different substrate concentrations or different reaction time). Reacted and unreacted molecules are separated and either (or both) of these fractions is prepared for high-throughput sequencing. The reads from DNA sequencing are processed to obtain a count table for each unique sequence across samples, normalized by a standard, and abundances across samples are fit into a kinetic model to estimate parameters (e.g. rate constants). react. frac. = reacted fraction.
Figure 2.
Figure 2.
Effect of experimental factors on model identifiability to separately estimate formula image and formula image. Identifiability was evaluated using metric formula image, based on the simulated effects of (A) choice of BYO samples (with relative error = 0.2) and (B) relative error (using the BYO series of the extended substrate range). Reacted fractions for 10 201 (1012) simulated sequences with true formula image, formula image in the parameter space shown in the figure were fit to the pseudo-first order model, and formula image values for each sequence were calculated from 100 bootstrapped samples. Higher values of formula image indicate that formula image and formula image are less separable. (A) Choosing a wider range of BYO concentration is more effective in improving the region of identifiable data compared to adding more replicates of the same BYO concentrations. (B) With higher measurement error, formula image and formula image become increasingly difficult to estimate separately.
Figure 3.
Figure 3.
Comparison of RNA quantitation methods for k-Seq. Total RNA amount quantified for samples incubated with different BYO concentrations, determined by spike-in method vs. direct quantification using Qubit or qPCR, correlates well (Pearson's r = 0.999, P-value = formula image) and with comparable relative standard deviation (Supplementary Figure S2). Error bars show standard deviations calculated from triplicates for reacted samples.
Figure 4.
Figure 4.
Distribution of mutants in the pool and the effect of sequencing error. (A) Relative abundance (counts) of sequences in the unreacted pool (four ribozyme families, total number of reads = 32 931 917), categorized by Hamming distance to its nearest family center. Observed abundance of different classes was similar to the expected number of counts (black dashed line). (B) The effect of different levels of sequencing error (formula image) to the expected observed abundance as the ratio to the true abundance for mutants with different orders (formula image) in a variant pool with 9% mutation rate. Due to the mixed effects of losing counts from being misidentified to a neighboring sequence and gaining counts from the misidentification of a neighboring sequence, the observed abundance for a sequence would either decrease (formula image) or first increase then decrease (formula image) as the sequencing error increases. See Supplementary Text S3 for calculation details.
Figure 5.
Figure 5.
Accuracy of parameter estimation by k-Seq. (A) Dependence of accuracy (ratio of estimated formula image to true formula image) on mean counts across all simulated samples (including the unreacted pool sample). The dashed lines correspond to ratios as labeled. Ratios >100-fold or <0.01-fold are shown at the borders of the plot. (B) Fraction of sequences for which the CI-95, estimated using bootstrapping or using triplicates, includes the true formula image values, for sequences with different mean counts across all samples. Sequences were ranked by mean counts (from highest to lowest) and binned in sets of 25 000 sequences. Each data point indicates the fraction of CI-95 that includes the true values in each bin.
Figure 6.
Figure 6.
Precision of estimation by k-Seq. (A) Fold-range (97.5-percentile/2.5-percentile) of formula image estimation depended on the mean counts. Increasing mean counts increases precision, as shown by the relationship of fold-range with mean counts across different orders of mutants. For formula image, only 1000 sequences were randomly selected for visualization. (B) Alignment between estimated formula image from two independently conducted experiments (experiment from (2), and the k-Seq experiment reported here). Only sequences with 2.5-percentile higher than baseline catalytic coefficient (formula image, reported in (2)) were included. Each point represents a sequence whose color reflects the minimum of mean counts (between two experiments).

Similar articles

Cited by

References

    1. Dhamodharan V., Kobori S., Yokobayashi Y. Large scale mutational and kinetic analysis of a self-hydrolyzing deoxyribozyme. ACS Chem. Biol. 2017; 12:2940–2945. - PubMed
    1. Pressman A.D., Liu Z., Janzen E., Blanco C., Müller U.F., Joyce G.F., Pascal R., Chen I.A. Mapping a systematic ribozyme fitness landscape reveals a frustrated evolutionary network for self-aminoacylating RNA. J. Am. Chem. Soc. 2019; 141:6213–6223. - PMC - PubMed
    1. Kobori S., Yokobayashi Y. High-throughput mutational analysis of a twister ribozyme. Angew. Chem. Int. Ed. 2016; 55:10354–10357. - PMC - PubMed
    1. Andreasson J.O.L., Savinov A., Block S.M., Greenleaf W.J. Comprehensive sequence-to-function mapping of cofactor-dependent RNA catalysis in the glmS ribozyme. Nat. Commun. 2020; 11:1663. - PMC - PubMed
    1. Yokobayashi Y. High-throughput analysis and engineering of ribozymes and deoxyribozymes by sequencing. Acc. Chem. Res. 2020; 53:2903–2912. - PubMed

Publication types