In an article recently published in the journal Science, Yaniv Erlich and Dina Zielinski showed that it is possible to store high density digital information in DNA molecules and reliably retrieve it. As they report, they stored a complete operating system, a movie, and other files with a total of more than 2MB, and managed to retrieve all the information with zero errors.
One of the critical factors of success is to use the appropriate coding methods: “Biochemical constraints dictate that DNA sequences with high GC content or long homopolymer runs (e.g., AAAAAA…) are undesirable, as they are difficult to synthesize and prone to sequencing errors.”
Using the so-called DNA fountain strategy, they managed to overcome the limitations that arise from biochemical constraints and recovery errors. As they report in the Science article “We devised a strategy for DNA storage, called DNA Fountain, that approaches the Shannon capacity while providing robustness against data corruption. Our strategy harnesses fountain codes , which have been developed for reliable and effective unicasting of information over channels that are subject to dropouts, such as mobile TV (20). In our design, we carefully adapted the power of fountain codes to overcome both oligo dropouts and the biochemical constraints of DNA storage.”
The encoded data was written using DNA synthesis and the information was retrieved by performing PCR and sequencing the resulting DNA using Illumina sequencers.
Other studies, including the pioneering one by Church, in 2012, predicted that DNA storage could theoretically achieve a maximum information density of 680 Peta bytes per gram of DNA. The authors managed to perfectly retrieve the information from a physical density of 215 Peta bytes per gram. For comparison, a flash memory with about one gram can carry, at the moment, up to 128GB, a density 3 orders of magnitude lower.
The authors report that the cost of storage and retrieval, which was $3500/Mbyte, still represents a major bottleneck.