DNA as an efficient data storage medium

In an article recently published in the journal Science, Yaniv Erlich and Dina Zielinski showed that it is possible to store high density digital information in DNA molecules and reliably retrieve it. As they report, they stored a complete operating system, a movie, and other files with a total of more than 2MB, and managed to retrieve all the information with zero errors.

One of the critical factors of success is to use the appropriate coding methods: “Biochemical constraints dictate that DNA sequences with high GC content or long homopolymer runs (e.g., AAAAAA…) are undesirable, as they are difficult to synthesize and prone to sequencing errors.” 

Using the so-called DNA fountain strategy, they managed to overcome the limitations that arise from biochemical constraints and recovery errors. As they report in the Science article “We devised a strategy for DNA storage, called DNA Fountain, that approaches the Shannon capacity while providing robustness against data corruption. Our strategy harnesses fountain codes , which have been developed for reliable and effective unicasting of information over channels that are subject to dropouts, such as mobile TV (20). In our design, we carefully adapted the power of fountain codes to overcome both oligo dropouts and the biochemical constraints of DNA storage.”

The encoded data was written using DNA synthesis and the information was retrieved by performing PCR and sequencing the resulting DNA using Illumina sequencers.

Other studies, including the pioneering one by Church, in 2012, predicted that DNA storage could theoretically achieve a maximum information density of 680 Peta bytes per gram of DNA. The authors managed to perfectly retrieve the information from a physical density of 215 Peta bytes per gram. For comparison, a flash memory with about one gram can carry, at the moment, up to 128GB, a density 3 orders of magnitude lower.

The authors report that the cost of storage and retrieval, which was $3500/Mbyte, still represents a major bottleneck.

Arrival of the Fittest: why are biological systems so robust?

In his 2014 book, Arrival of the Fittest, Andreas Wagner addresses important open questions in evolution: how are useful innovations created in biological systems, enabling natural selection to perform its magic of creating ever more complex organisms? Why is it that changes in these complex systems do not lead only to non-working systems? What is the origin of variation upon which natural selection acts?

Wagner’s main point is that “Natural selection can preserve innovations, but it cannot create them. Nature’s many innovations—some uncannily perfect—call for natural principles that accelerate life’s ability to innovate, its innovability.”51bwxg5grcl-_sx324_bo1204203200_

In fact, natural selection can apply selective pressure, selecting organisms that have useful phenotypic variations, caused by the underlying genetic variations. However, for this to happen, genetic mutations and variations have to occur and, with high enough frequency, they have to lead to viable and more fit organisms.

In most man-made systems, almost all changes in the original design lead to systems that do not work, or that perform much worse than the original. Performing almost any random change in a plane, in a computer or in a program leads to a system that either performs worst than the original, or else, that fails catastrophically. Biological systems seem much more resilient, though. In this book, Wagner explores several types of (conceptual) biological networks: metabolic networks, protein interaction networks and gene regulatory networks.

Each node in these networks corresponds to one specific biological function: in the first case, a metabolic network, where chemical entities interact; in the second case, a protein interaction network, where proteins interact to create complex functions; and in the third case, a gene regulatory network, where genes regulate the expression of other genes. Two nodes in such networks are neighbors if they differ in only one DNA position, in the genotype that encodes the network.

He concludes that these networks are robust to mutations and, therefore, to innovations. In particular, he shows that you can traverse these networks, from node to neighboring node, while keeping the biological function unchanged, only slightly degraded, or even improved. Unlike man-made systems, biological systems are robust to change, and nature can experiment tweaking them, in the process creating innovation and increasingly complex systems. This how the amazingly complex richness of life has been created in a mere four billion years.


Writing a Human Genome from scratch: the Genome Project-write

The Genome Project-write has released a white paper, with a clear proposal of the steps and timeline that will be required to design and assemble a human genome from scratch.


The project is a large scale project, involving a significant number of institutions, and many well-known researchers, including George Church and Jef Boeke. According to the project web page:

“Writing DNA is the future of science and medicine, and holds the promise of pulling us forward into a better future. While reading DNA code has continued to advance, our capability to write DNA code remains limited, which in turn restricts our ability to understand and manipulate biological systems. GP-write will enable scientists to move beyond observation to action, and facilitate the use of biological engineering to address many of the global problems facing humanity.”

The idea is to use existing technologies for DNA synthesis to accelerate research in a wide spectrum of life-sciences. The synthesis of human genomes may make it possible to understand the phenotypic results of specific genome sequences and will contribute to improve the quality of synthetic biology tools.

Special attention will be paid to the complex ethical, legal and social issues that are a consequence of the project.

The project has received wide coverage, in a number of news sources, including popular science sites such as Statnews and the journal Science.

Reaching “longevity escape velocity”…

The concept that we may one day reach “longevity escape velocity“, a point in time when life expectancy increases by more than one year, every year, is not new. Many people believe that advances in medical and biological sciences will one day create the possibility that humans will live, if not forever, at least for millennia.

An interesting and very informative article in The Economist surveys some of the many ongoing efforts towards extending human longevity.


The “low tech” approach is based on the idea that calorie restriction (CR), the consistent ingestion of significantly less calories that what is normal, will significantly prolong life. Although the evidence is scant that CR is effective in normal humans, there exists some evidence that, under this regimen, other animals (and unicellular organisms) tend to live longer. The idea is that even a life extension of a few years may take you past the threshold where medical science may extend your life for centuries. So, a Pascal’s Wager makes sense: a few decades of sacrifice, in exchange for centuries of happy life.

More high-tech approaches include genetic manipulation and the development of special drugs that may delay ageing, such as metformim, resveratrol, or rapamycin. Clinical trials are at present very limited, because ageing is not considered a disease  and, as such, anti-ageing drugs cannot get regulatory approval. Self-experimentation seems to be very common in the field, though.

Interest in this type of research is likely to increase, as the population of developed countries ages, and the prospect of significant increase of life expectancy becomes more real. Believers in the singularity have one more incentive. After all, you only need to live enough to get to the singularity.

Next challenge: a synthetic human?

A group of researchers is calling for the next challenge in genetics: create an entirely synthetic human genome. The Human Genome Project Write (HGP-write) aims at creating a human genome from scratch, using the information available from thousands of sequenced human genomes.

Creating a DNA sequence that corresponds to a viable human being is quite an achievable challenge with existing technology. The large number of sequenced human genomes provide an excellent blueprint for that such a genome could be. Poorly understood or hard to sequence regions provide considerable challenges, but they should not be impossible to tackle. More difficult would be to create viable cell lines out of the synthesised DNA, or even viable embryos.


As IEEE Sprectrum reports, the subject has received considerable attention in the media, namely in the NY Times. The authors of the proposal have already said that they do not intend to create synthetic humans, but only advance the state of the art in genetics research. Their objective is to understand better the human genome, by building a human (and other) genome from scratch. However, one never knows where a road leads, only where it starts from.


A new and improved tree of life brings some surprising results

In a recent article, published in the journal Nature Microbiology, a group of researchers from UC Berkeley, in collaboration with other universities and institutes, proposed a new version of the tree of life, which dramatically changes our view of the relationships between the species inhabiting planet Earth.

Many depictions of the tree of life tend to focus on the enormous and well known diversity of eukaryotes, a group of organisms composed of complex cells that includes all animals, plants and fungi.

This version of the tree of life, now published, uses metagenomics analysis of genomic data from many organisms little known before, together with published sequences of genomic data, to infer a significantly different version of the tree of life. This new view reveals the dominance of bacterial diversification.  A full scale version of the proposed tree of life enables you to find our own ancestors, in the extreme bottom right of the figure, the Opisthokont group of organisms. The Opisthokonts include both the animal and fungus kingdoms,  together with other eukaryotic microorganisms. Opisthokont flagelate cells, such as the sperm of most animals and the spores of the chytrid fungi, propel themselves using a single posterior flagellum, a feature that gives the group its name. At the level of resolution used in the study, humans and mushrooms are so close that they cannot be told apart.


This version of the tree of life maintains the three great trunks that Carl Woese and his colleagues published in the first “universal tree of life”, in the seventies.

Our own trunk, known as eukaryotes, includes animals, plants, fungi and protozoans. A second trunk included many familiar bacteria like Escherichia coli. The third trunk, the Archaea, includes little-known microbes that live in extreme places like hot springs and oxygen-free wetlands.



However, this more extensive and detailed analysis, based on extensive genomic data, provides a more global view of the evolutionary process that has shaped life on Earth for the last four billion years.

Images from the article in Nature Microbiology, by Hug et. al., and the work of Woese et al.