Autism spectrum disorder (ASD) is a neurodevelopmental disorder that affects nearly 3% of children and has a strong genetic component. While hundreds of ASD risk genes have been identified through sequencing studies, the genetic heterogeneity of ASD makes identifying additional risk genes using these methods challenging. To predict candidate ASD risk genes, we developed a simple machine learning model, ASiDentify (ASiD), using human genomic, RNA- and protein-based features. ASiD identified over 1,300 candidate ASD risk genes, over 300 of which have not been previously predicted. ASiD made accurate predictions of ASD risk genes using 6 features predictive of ASD risk gene status, including mutational constraint, synapse localization and gene expression in neurons, astrocytes and non-brain tissues. Particular functional groups of proteins found to be strongly implicated in ASD include RNA-binding proteins (RBPs) and chromatin regulators. We constructed additional logistic regression models to make predictions and assess informative features specific to RBPs, including mutational constraint, or chromatin regulators, for which both expression level in excitatory neurons and mutational constraint were informative. The fact that RBPs and chromatin regulators had informative features distinct from all protein-coding genes suggests that specific biological pathways connect risk genes with different molecular functions to ASD.
Keywords: LOEUF; RNA-binding proteins; SFARI; astrocytes; autism spectrum disorder; chromatin regulators; excitatory neurons; inhibitory neurons; machine learning; synapse.
© The Author(s) 2025. Published by Oxford University Press on behalf of The Genetics Society of America. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.