Towards long double-stranded chains and robust DNA-based data storage using the random code system

Front Genet. 2023 Jun 13:14:1179867. doi: 10.3389/fgene.2023.1179867. eCollection 2023.

Abstract

DNA has become a popular choice for next-generation storage media due to its high storage density and stability. As the storage medium of life's information, DNA has significant storage capacity and low-cost, low-power replication and transcription capabilities. However, utilizing long double-stranded DNA for storage can introduce unstable factors that make it difficult to meet the constraints of biological systems. To address this challenge, we have designed a highly robust coding scheme called the "random code system," inspired by the idea of fountain codes. The random code system includes the establishment of a random matrix, Gaussian preprocessing, and random equilibrium. Compared to Luby transform codes (LT codes), random code (RC) has better robustness and recovery ability of lost information. In biological experiments, we successfully stored 29,390 bits of data in 25,700 bp chains, achieving a storage density of 1.78 bits per nucleotide. These results demonstrate the potential for using long double-stranded DNA and the random code system for robust DNA-based data storage.

Keywords: DNA-based data storage; highly robust; long double-stranded chains; random code system; random equilibrium; random matrix.

Grants and funding

This work was supported by the National key R and D Program of China (Grant 2019YFA0706338402) and the National Natural Science Foundation of China under grant 62272009, 62072129, 62172302.