Immunoglobulins (Ig) are produced by B lymphocytes as secreted antibodies or as part of the B-cell receptor. There is tremendous diversity of potential Ig transcripts (>1 × 10(12)) as a result of hundreds of germ-line gene segments, random nucleotide incorporation during joining of gene segments into a complete transcript, and the process of somatic hypermutation at individual nucleotides. This recombination and mutation process takes place in the maturing B cell and is responsible for the diversity of potential epitope recognition. Cancers arising from mature B cells are characterized by clonal production of Ig heavy (IGH@) and light chain transcripts, although whether the sequence has undergone somatic hypermutation is dependent on the maturation stage at which the neoplastic clone arose. Chronic lymphocytic leukemia (CLL) is the most common leukemia in adults and arises from a mature B cell with either mutated or unmutated IGH@ transcripts, the latter having worse prognosis and the assessment of which is routinely performed in the clinic. Currently, IGHV mutation status is assessed by Sanger sequencing and comparing the transcript to known germ-line genes. In this paper, we demonstrate that complete IGH@ V-D-J sequences can be computed from unselected RNA-seq reads with results equal or superior to the clinical procedure: in the only discordant case, the clinical transcript was out-of-frame. Therefore, a single RNA-seq assay can simultaneously yield gene expression profile, SNP and mutation information, as well as IGHV mutation status, and may one day be performed as a general test to capture multidimensional clinically relevant data in CLL.
Keywords: B cells; CLL; RNA sequencing; immunoglobulin; somatic hypermutation.