The 3' untranslated region (3' UTR) of mRNA contains elements that play regulatory roles in polyadenylation, localization, translation efficiency, and mRNA stability. Despite the significance of the 3' UTR, there is no popular method for annotating 3' UTRs and for profiling their isoforms. Recently, poly(A)-position profiling by sequencing (3P-seq) and other similar methods have successfully been used to annotate 3' UTRs; however, they contain complex RNA-biochemical experimental steps, resulting in a low yield of products. In this paper, we propose heuristic and regression methods to estimate and quantify the usage of 3' UTRs with widely profiled RNA sequencing (RNA-seq) data. With this approach, the 3' UTR usage estimated from RNA-seq was found to be highly correlated to that of 3P-seq, and poly(A) cleavage signals of 3' UTRs were detected upstream of the predicted poly(A) cleavage sites. Our methods predicted greater number of 3' UTRs than 3P-seq, which allows the profiling of the 3' UTRs of most expressed genes in diverse cell-types, stages, and species. Hence, the computational RNA-seq method for the estimation of the 3' UTR landscape would be useful as a tool for studying not only the functional roles of 3' UTR but also gene regulation by 3' UTR in a cell type-specific context. The method is implemented in open-source code, which is available at http://big.hanyang.ac.kr/GETUTR.
Keywords: 3′ UTR; 3′ UTR landscape; Isotonic regression; RNA-seq.
Copyright © 2015 Elsevier Inc. All rights reserved.