Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jun 25;14:205.
doi: 10.1186/1471-2105-14-205.

Automatic Workflow for the Classification of Local DNA Conformations

Affiliations
Free PMC article

Automatic Workflow for the Classification of Local DNA Conformations

Petr Čech et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: A growing number of crystal and NMR structures reveals a considerable structural polymorphism of DNA architecture going well beyond the usual image of a double helical molecule. DNA is highly variable with dinucleotide steps exhibiting a substantial flexibility in a sequence-dependent manner. An analysis of the conformational space of the DNA backbone and the enhancement of our understanding of the conformational dependencies in DNA are therefore important for full comprehension of DNA structural polymorphism.

Results: A detailed classification of local DNA conformations based on the technique of Fourier averaging was published in our previous work. However, this procedure requires a considerable amount of manual work. To overcome this limitation we developed an automatic classification method consisting of the combination of supervised and unsupervised approaches. A proposed workflow is composed of k-NN method followed by a non-hierarchical single-pass clustering algorithm. We applied this workflow to analyze 816 X-ray and 664 NMR DNA structures released till February 2013. We identified and annotated six new conformers, and we assigned four of these conformers to two structurally important DNA families: guanine quadruplexes and Holliday (four-way) junctions. We also compared populations of the assigned conformers in the dataset of X-ray and NMR structures.

Conclusions: In the present work we developed a machine learning workflow for the automatic classification of dinucleotide conformations. Dinucleotides with unassigned conformations can be either classified into one of already known 24 classes or they can be flagged as unclassifiable. The proposed machine learning workflow permits identification of new classes among so far unclassifiable data, and we identified and annotated six new conformations in the X-ray structures released since our previous analysis. The results illustrate the utility of machine learning approaches in the classification of local DNA conformations.

Figures

Figure 1
Figure 1
A workflow of the classification of local DNA conformations.k-NN uses 11 neighbors (parameter k). A threshold vcrit = 0.001 (see explanation in the Methods section of the manuscript) was used to distinguish between data points that can be assigned to some of existing classes or cannot be assigned at all. Cluster analysis uses a modified version of the single-pass nonhierarchical leader algorithm[89].
Figure 2
Figure 2
Two repeating units in a DNA dinucleotide chain. One residue (nucleotide) is defined from phosphate to phosphate. Conformation of each residue is given by six backbone torsion angles α, …, ζ, and by the glycosidic torsion angle χ. “Suite” goes from δ to δ+1 angles consisting from the following torsions: δ, ϵ, ζ, α+1, β+1, γ+1, δ+1.
Figure 3
Figure 3
Oxytricha nova guanine quadruplex. (a) A schematic diagram of a double-stranded (bimolecular) guanine quadruplex from Oxytricha nova telomeric sequence (G4T4G4)2. A solid line represents a sugar-phosphate backbone. O. nova G-quadruplex has four G-quartets formed from nucleotides in which syn and anti conformations of the glycosidic angle alternate along each strand [105]. Shaded rectangles indicate guanine residues in syn conformation (typically χ ~ 60°-70°), clear rectangles indicate guanine residues in anti conformation (typically χ ~ 250°-260°). (b) A crystal structure of a bimolecular O. nova G-quadruplex 1JPQ [104]. Overall topology is indicated by the orange ribbon. Bases are represented by green sticks, potassium ions stabilizing the whole structure are shown as yellow spheres. (c) A crystal structure of a complex of O. nova G-quadruplex with a drug acridine 3EUM [106]. Acridine affecting the conformation of a T4 loop in chain A is shown in blue. (d) Consensus conformational map of the O. nova G-quadruplex. By convention, chains are numbered in the 5′-to-3′ direction. Conformational classes of individual dinucleotide steps are indicated by red numbers, their size is proportional to the frequency of their occurrence in investigated structures. A description of individual conformations is given in Tables 1 and 3. The T5T6 step adopts either a canonical BI conformation 54 if the G4T5 step is also in a canonical BI conformation, or an A-to-B conformation 41 if the G4T5 step is in a conformation 32. (e) Consensus conformational map of the O. nova G-quadruplex complexed with a drug acridine. Individual conformations shown as red numbers are characterized in Tables 1 and 3.
Figure 4
Figure 4
Structure of a four-way (Holliday) junction in an inverted repeat sequence 1DCW [115]. The backbone between residues A6 and C7 in chains B and D (shown in red) adopts an unusual BI-like conformation 115 with high ϵ (~ 275°) and A-like χ+1 (~ 208°).
Figure 5
Figure 5
Comparison of a fraction of individual conformational classes (Tables1and3) identified in structures resolved by X-ray (816 structures) and NMR techniques (664 structures).

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles

References

    1. Watson JD, Crick FHC. Molecular structure of nucleic acids - a structure for deoxyribose nucleic acid. Nature. 1953;171(4356):737–738. doi: 10.1038/171737a0. - DOI - PubMed
    1. Drew HR, Wing RM, Takano T, Broka C, Tanaka S, Itakura K, Dickerson RE. Structure of a B-DNA dodecamer: conformation and dynamics. Proc Natl Acad Sci USA. 1981;78(4):2179–2183. doi: 10.1073/pnas.78.4.2179. - DOI - PMC - PubMed
    1. Wang AH, Fujii S, van Boom JH, Rich A. Molecular structure of the octamer d(G-G-C-C-G-G-C-C): modified A-DNA. Proc Natl Acad Sci USA. 1982;79(13):3968–3972. doi: 10.1073/pnas.79.13.3968. - DOI - PMC - PubMed
    1. McCall M, Brown T, Kennard O. The crystal structure of d(G-G-G-G-C-C-C-C). A model for poly(dG).poly(dC) J Mol Biol. 1985;183(3):385–396. doi: 10.1016/0022-2836(85)90009-9. - DOI - PubMed
    1. Wang AHJ, Quigley GJ, Kolpak FJ, Crawford JL, Vanboom JH, Vandermarel G, Rich A. Molecular-structure of a left-handed double helical DNA fragment at atomic resolution. Nature. 1979;282(5740):680–686. doi: 10.1038/282680a0. - DOI - PubMed

Publication types

LinkOut - more resources

Feedback