Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12 (5), e0177574
eCollection

Assessment of Antibody Library Diversity Through Next Generation Sequencing and Technical Error Compensation

Affiliations

Assessment of Antibody Library Diversity Through Next Generation Sequencing and Technical Error Compensation

Marco Fantini et al. PLoS One.

Abstract

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Diagram of sequenced adaptor-antibody-adaptor constructs.
A) scFv library (gray), comprising heavy (VH) and light chain (VL) Complementary Determining Regions (CDR), was ligated to adapters (light green and pink) harbouring Illumina P5 and P7 flowcell hybridization sequences (green and red). B) VH nanobody library (gray), comprising heavy chain (VH) Complementary Determining Regions (CDR), was ligated to adapters (light green and pink) harbouring Illumina P5 and P7 flowcell hybridization sequences (green and red). The forward read (R1) uses SBS3 sequencing primer (Illumina), while the reverse read (R2) uses SBS12 primer (Illumina). iS1 and iS2 = index/shifter sequences.
Fig 2
Fig 2. Phi-X derived and Phred score derived error rate distribution.
A) Phred score error rate distribution for the hscFv1 library of the merged reads. Error rate increases with sequencing cycles. B) Control Phi-X derived error rate distribution for the hscFv1 library of the merged reads. Error rate is more prominent in the early sequencing cycles (spikes), with a small increase at the end of each read. The error distribution does not match the Phred score distribution and the shape differs as well. C) Scatter plot of the correlation of Q-score and log2(% Mismatches) in Phi-x control spike-in library. Each point represents the mean value from a single flow cell tile at a given sequencing read number, encoded by colour (red to blue: R1 cycle 1 to 350; R2 cycle 1 to 250; colour flex point is set at cycle 38). The Q score in the first 40 reads fails to be predictive of mismatch rate. Similar results were obtained for hscFv2 and hVH libraries.
Fig 3
Fig 3. Diagram of DEAL workflow.
A) Diagram of the seed creation process. In the figure, the black arrows represent the combined reads of the scFv library after the trimming. The seed is created combining the two seeding regions. The seeding regions are placed in the CDR3s to maximize the number of different seeds: the higher the number, the faster the program will run. B) Binary tree of the seeds. The program uses a binary tree approach to group identical seeds. During the comparison, if one sequence does not match any other sequences seen so far, a new branch of the tree is created in the mismatching position. C) The input of the binary comparison step. While the seeding step takes only into account the diversity of the seeding regions, the binary comparison analyzes the whole length of the combined reads. D) Flagging process. If some positions of the sequence are unreliable due to being associated to a low Phred quality score (as shown in the figure) or to a poor quality cycle (from Phi-X errors, not shown in the figure), the program flag them for correction. E) The three different scenarios that can occur during binary comparison among the sequences in the same seeding group. Mismatching (top): if two compared sequences differ in even only one position (bold) where none of the alternatives are flagged, the program recognize them as different sequences and does not group them. Matching sequences with a position having one flagged nucleotide (middle): the program recognizes the two sequences as identical and groups them together. All the positions where one of the sequences has a flag is resolved, during merging, as the not flagged nucleotide on the other sequence. Matching sequences with a position having both alternative nucleotides flagged (bottom): the program recognizes the two sequences as identical and groups them together. All the positions where both sequences have a flag are resolved using the IUPAC nucleobases ambiguity codes. The resulting merged sequence is flagged in that position.
Fig 4
Fig 4. Distribution of library sequence cluster cardinality.
Distribution of library sequence cluster cardinality. The more the curve is skewed towards high cardinality clusters, the lower the complexity of the library is expected to be.
Fig 5
Fig 5. Chain/VDJ assortment independence of libraries.
A) hscFv1. B) hscFv2. C) hVH. Top panels: barplots of forward and reverse primer distributions. Bottom panels: heatmaps of library primers distributions. Observed distribution is the primer pair proportion found after sequencing. Expected distribution is the multiplication of the two primers proportion (expected distribution given the independence between chains for the scFv libraries or given a balanced VDJ recombination for hVH). UC = unclassified. This category includes all the sequences that do not match any primer. The name of the primers is a shorter version of the original name listed in Supporting Information (Primer used for library construction).
Fig 6
Fig 6. Length distribution of human VH nanobody library sequences.
Barplot of the length distribution of human VH nanobody library sequences coloured by reading frame.

Similar articles

See all similar articles

Cited by 4 articles

References

    1. Winter G, Milstein C. Man-made antibodies. Nature. 1991;349: 293–299. 10.1038/349293a0 - DOI - PubMed
    1. Marks JD, Hoogenboom HR, Bonnert TP, McCafferty J, Griffiths AD, Winter G. By-passing immunization. Human antibodies from V-gene libraries displayed on phage. J Mol Biol. 1991;222: 581–597. Available: http://www.ncbi.nlm.nih.gov/pubmed/1748994 - PubMed
    1. Hanes J, Pluckthun A. In vitro selection and evolution of functional proteins by using ribosome display. Proc Natl Acad Sci U S A. 1997;94: 4937–4942. Available: http://www.ncbi.nlm.nih.gov/pubmed/9144168 - PMC - PubMed
    1. He M, Taussig MJ. Antibody-ribosome-mRNA (ARM) complexes as efficient selection particles for in vitro display and evolution of antibody combining sites. Nucleic Acids Res. 1997;25: 5132–5134. Available: http://www.ncbi.nlm.nih.gov/pubmed/9396828 - PMC - PubMed
    1. Visintin M, Tse E, Axelson H, Rabbitts TH, Cattaneo A. Selection of antibodies for intracellular function using a two-hybrid in vivo system. Proc Natl Acad Sci U S A. 1999;96: 11723–11728. Available: http://www.ncbi.nlm.nih.gov/pubmed/10518517 - PMC - PubMed

Grant support

Funded by European Union Seventh Framework Program [grant No. 604102 A.C.] (Human Brain Project). https://www.humanbrainproject.eu/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Feedback