Background: MicroRNAs(miRNAs) are 18-25 nt small RNAs playing critical roles in many biological processes. The majority of known miRNAs were discovered by conventional cloning and a Sanger sequencing approach. The next-generation sequencing (NGS) technologies enable in-depth characterization of the global repertoire of miRNAs, and different protocols for miRNA library construction have been developed. However, the possible bias between the relative expression levels and sequences introduced by different protocols of library preparation have rarely been explored.
Results: We assessed three different miRNA library preparation protocols, SOLiD, Illumina versions 1 and 1.5, using cloning or SBS sequencing of total RNA samples extracted from skeletal muscles from Hu sheep and Dorper sheep, and then validated 9 miRNAs by qRT-PCR. Our results show that SBS sequencing data highly correlate with Illumina cloning data. The SOLiD data, when compared to Illumina's, indicate more dispersed distribution of length, higher frequency variation for nucleotides near the 3'- and 5'-ends, higher frequency occurrence for reads containing end secondary structure (ESS), and higher frequency for reads that do not map to known miRNAs. qRT-PCR results showed the best correlation with SOLiD cloning data. Fold difference of Hu sheep and Dorper sheep between qRT-PCR result and SBS sequencing data correlated well (r = 0.937), and fold difference of miR-1 and miR-206 among SOLiD cloning data, qRT-PCR and SBS sequencing data was similar.
Conclusions: The sequencing depth can influence the quantitative measurement of miRNA abundance, but the discrepancy caused by it was not statistically significant as high correlation was observed between Illumina cloning and SBS sequencing data. Bias of length distribution, sequence variation, and ESS was observed between data obtained with the different protocols. SOLiD cloning data differ from Illumina cloning data mainly because of distinct methods of adapter ligation. The good correlation between qRT-PCR result and SOLiD data might be due to the similarities of the hybridization-based methods. The fold difference analysis indicated that methods based on hybridization may be superior for quantitative measurement of miRNA abundance. Because of the genome sequence of the sheep is not available, our data may not explain how the entire miRNA bias in the natural miRNAs in sheep or other mammal miRNA expression, unbiased artificially synthesized miRNA will help on evaluating the methodology of miRNA library preparation.