Computational identification of harmful mutation regions to the activity of transposable elements

BMC Genomics. 2017 Nov 17;18(Suppl 9):862. doi: 10.1186/s12864-017-4227-z.

Abstract

Background: Transposable elements (TEs) are interspersed DNA sequences that can move or copy to new positions within a genome. TEs are believed to promote speciation and their activities play a significant role in human disease. In the human genome, the 22 AluY and 6 AluS TE subfamilies have been the most recently active, and their transposition has been implicated in many inherited human diseases and in various forms of cancer. Therefore, understanding their transposition activity is very important and identifying the factors that affect their transpositional activity is of great interest. Recently, there has been some work done to quantify the activity levels of active Alu TEs based on variation in the sequence. Given this activity data, an analysis of TE activity based on the position of mutations is conducted.

Results: A method/simulation is created to computationally predict so-called harmful mutation regions in the consensus sequence of a TE; that is, mutations that occur in these regions decrease the transpositional activity dramatically. The methods are applied to the most active subfamily, AluY, to identify the harmful regions, and seven harmful regions are identified within the AluY consensus with q-values less than 0.05. A supplementary simulation also shows that the identified harmful regions covering the AluYa5 RNA functional regions are not occurring by chance. This method is then applied to two additional TE families: the Alu family and the L1 family, to computationally detect the harmful regions in these elements.

Conclusions: We use a computational method to identify a set of harmful mutation regions. Mutations within the identified harmful regions decrease the transpositional activity of active elements. The correlation between the mutations within these regions and the transpositional activity of TEs are shown to be statistically significant. Verifications are presented using the activity of AluY elements and the secondary structure of the AluYa5 RNA, providing evidence that the method is successfully identifying harmful mutation regions.

Keywords: Harmful mutation regions; Multiple testing correction; Pearson’s coefficient of correlation; Statistical significance test; The human genome; Transposable elements.

MeSH terms

  • Alu Elements*
  • Computational Biology / methods*
  • DNA Transposable Elements*
  • Evolution, Molecular
  • Genome, Human*
  • Humans
  • Models, Genetic
  • Mutation*

Substances

  • DNA Transposable Elements