Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8 Suppl 1 (Suppl 1), S7

Performance of Case-Control Rare Copy Number Variation Annotation in Classification of Autism

Performance of Case-Control Rare Copy Number Variation Annotation in Classification of Autism

Worrawat Engchuan et al. BMC Med Genomics.


Background: A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry.

Results: We analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations.

Conclusions: CF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders.


Figure 1
Figure 1
Cross-validation strategy. The data-set is divided into three equal subsets, each with the same propotion of ASD and control subjects. Two of the tree subsets are used as the training set the model, whereas the other subset is used as the validation set for performance quantification; this is iterated three times, so that each subset is used twice for training and once for validation. The feature selection is performed only for GO and pathway-based features. The remaining set is used as test set to assess the performance of classification. The cross-validation procedure is repeated times to estimate the mean performance and its standard deviation.
Figure 2
Figure 2
RF and CF feature relevance, boxplots for the 20 curated neurally-relevant features. Feature relevance boxplots for loss-based features (red) and gain-based features (blue). Mean decrease gini (MDG) and Mean decrease accuracy (MDA) were used for RF. MDA, with and without correlation adjustment, was used for CF. For all relevance metrics, higher values correspond to more relevant features.

Similar articles

See all similar articles

Cited by 4 articles


    1. Lai MC, Lombardo MV, Baron-Cohen S. Autism. Lancet. 2014;383(9920):896–910. doi: 10.1016/S0140-6736(13)61539-1. - DOI - PubMed
    1. Elsabbagh M, Divan G, Koh YJ, Kim YS, Kauchali S, Marcin C, Montiel-Nava C, Patel V, Paula CS, Wang C. et al. Global prevalence of autism and other pervasive developmental disorders. Autism research: official journal of the International Society for Autism Research. 2012;5(3):160–179. doi: 10.1002/aur.239. - DOI - PMC - PubMed
    1. Constantino JN, Todorov A, Hilton C, Law P, Zhang Y, Molloy E, Fitzgerald R, Geschwind D. Autism recurrence in half siblings: strong support for genetic mechanisms of transmission in ASD. Molecular psychiatry. 2013;18(2):137–138. doi: 10.1038/mp.2012.9. - DOI - PubMed
    1. Levy D, Ronemus M, Yamrom B, Lee YH, Leotta A, Kendall J, Marks S, Lakshmi B, Pai D, Ye K. et al. Rare de novo and transmitted copy-number variation in autistic spectrum disorders. Neuron. 2011;70(5):886–897. doi: 10.1016/j.neuron.2011.05.015. - DOI - PubMed
    1. Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J, Shago M, Moessner R, Pinto D, Ren Y. et al. Structural variation of chromosomes in autism spectrum disorder. American journal of human genetics. 2008;82(2):477–488. doi: 10.1016/j.ajhg.2007.12.009. - DOI - PMC - PubMed

Publication types