Exon level machine learning analyses elucidate novel candidate miRNA targets in an avian model of fetal alcohol spectrum disorder

PLoS Comput Biol. 2019 Apr 11;15(4):e1006937. doi: 10.1371/journal.pcbi.1006937. eCollection 2019 Apr.


Gestational alcohol exposure causes fetal alcohol spectrum disorder (FASD) and is a prominent cause of neurodevelopmental disability. Whole transcriptome sequencing (RNA-Seq) offer insights into mechanisms underlying FASD, but gene-level analysis provides limited information regarding complex transcriptional processes such as alternative splicing and non-coding RNAs. Moreover, traditional analytical approaches that use multiple hypothesis testing with a false discovery rate adjustment prioritize genes based on an adjusted p-value, which is not always biologically relevant. We address these limitations with a novel approach and implemented an unsupervised machine learning model, which we applied to an exon-level analysis to reduce data complexity to the most likely functionally relevant exons, without loss of novel information. This was performed on an RNA-Seq paired-end dataset derived from alcohol-exposed neural fold-stage chick crania, wherein alcohol causes facial deficits recapitulating those of FASD. A principal component analysis along with k-means clustering was utilized to extract exons that deviated from baseline expression. This identified 6857 differentially expressed exons representing 1251 geneIDs; 391 of these genes were identified in a prior gene-level analysis of this dataset. It also identified exons encoding 23 microRNAs (miRNAs) having significantly differential expression profiles in response to alcohol. We developed an RDAVID pipeline to identify KEGG pathways represented by these exons, and separately identified predicted KEGG pathways targeted by these miRNAs. Several of these (ribosome biogenesis, oxidative phosphorylation) were identified in our prior gene-level analysis. Other pathways are crucial to facial morphogenesis and represent both novel (focal adhesion, FoxO signaling, insulin signaling) and known (Wnt signaling) alcohol targets. Importantly, there was substantial overlap between the exomes themselves and the predicted miRNA targets, suggesting these miRNAs contribute to the gene-level expression changes. Our novel application of unsupervised machine learning in conjunction with statistical analyses facilitated the discovery of signaling pathways and miRNAs that inform mechanisms underlying FASD.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Big Data
  • Chick Embryo
  • Cluster Analysis
  • Computational Biology
  • Databases, Nucleic Acid / statistics & numerical data
  • Disease Models, Animal
  • Ethanol / toxicity
  • Exons / genetics*
  • Female
  • Fetal Alcohol Spectrum Disorders / genetics*
  • Gene Expression Profiling / statistics & numerical data
  • Humans
  • MicroRNAs / genetics*
  • Pregnancy
  • Principal Component Analysis
  • Unsupervised Machine Learning* / statistics & numerical data


  • MicroRNAs
  • Ethanol