A few years ago, Harvard professor and bio-engineering rockstar George Church was a guest on The Colbert Report. As many of the talk show's guests did, he presented host Stephen Colbert with a copy of his book, which he'd co-authored with Ed Regis. Well, sort of. He actually handed Colbert 20 million copies of his book -- and they all fit in his front pocket. 

How was that possible? Church had programmed 20 million copies of his book into DNA, which is known, rightly so, as the "information molecule" in biology. The incredible density of information that can be stored in DNA, and the potential to use it much like a traditional hard drive, has not been lost on technology companies scrambling to keep up with society's exponentially growing need for data storage.

Perhaps due to its having let the future pass it by on one too many occasions, Microsoft (MSFT -1.84%) is going all-in on the idea. It plans to deploy a proto-commercial, DNA-powered storage device about the size of a commercial Xerox copier by the end of the decade in one of its data centers. Even though it would cost billions using today's DNA synthesis technologies and only serve a niche application, the technology developed in the next three years will go a long way to ushering in digital-bio hybrid computing machines. It could be big for investors, too -- if it works.

Strands of DNA connected with binary code.

Image source: Getty Images.

The cloud, powered by DNA?

Microsoft and the rest of the information technology industry have spent billions of dollars on data centers to date in an endless battle to scale with the needs of consumers and growing demand for cloud computing. The most expensive part of a data center is power consumption, since it takes a lot of power to keep arrays of servers cool. That has become especially painful as traditional storage media have begun hitting their limits, which means tech companies won't be able to wring any further cost reductions out of scaling storage capacity with today's technology.

That's what makes DNA so intriguing, on paper at least. Volume for volume, it can store 10 million times more information than the magnetic tape drives commonly used today, which would drastically reduce power consumption per TB or square foot of space in a data center. One copy of your genome, held in just one of your body's cells, holds approximately 1.5 GB of information. And since your body contains trillions of cells, all of the DNA in all of your body's cells stores trillions of GB of information -- more than all of the digital data storage capacity in the entire world (although it's getting close).  

There is one massive obstacle to commercializing this technology, however: cost. Consider that state-of-the-art technology today can produce synthetic DNA for genetic engineering applications at costs of about $0.05 per base pair, and experts I've spoken to attest it could be two orders of magnitude lower for DNA data storage applications. (The end use affects the cost of DNA because each application has different requirements for accuracy, length, and yield.) 

A data server.

Image source: Getty Images.

Either way, even with the best technology today, it would take several months and hundreds of thousands of dollars to synthesize an equivalent amount of DNA held in a single cell of E. coli -- something the bacterium does for free in about 20 minutes.

The good news is the cost has fallen quickly -- it was $1 per base pair not long ago. The not-so-good news is Microsoft estimates costs would need to fall by a factor of 10,000 before DNA data storage could really take off. 

The company will likely be powerless to drive down the cost of DNA synthesis without outside help from biotech companies (supply) and fellow tech peers (demand). That's why Microsoft has partnered with the University of Washington and DNA synthesis leader Twist Bioscience, which has received investments from Illumina and Applied Materials, among others.

Earlier this year the trio made significant progress developing the basic technology required for DNA data storage, such as error-free read and write capabilities. The early-stage work also shows which areas need drastic improvement:  

  • The team used about 13.5 million pieces of DNA to store 200 MB of data. On one hand, that's the equivalent amount of DNA in one-third of a human genome. On the other hand, it's equivalent to the storage capacity of a cutting-edge Zip floppy disk from the late 1990s.
  • The speed of writing information to DNA was about 400 bytes per second. It needs to increase to 100 MB per second, or by a factor of 4 million, to be commercially viable.
  • The synthetic DNA used in the system cost as much as $800,000, although perhaps half that. Reducing the price by a factor of 10,000 would buy 200 MB of storage capacity for perhaps $40.

The biggest cost reductions will come from synthesizing DNA as close to free as possible (nature does this pretty efficiently), although improvements in technologies allowing us to more fully tap into the awesome storage density of DNA will greatly improve the cost-benefit ratio as well.

Put it all together and today's DNA synthesis and DNA data storage technologies would enable digital data storage devices at a cost of between $2 billion and $4 billion per TB. That's awfully expensive, but tremendous cost reductions -- more than 99.9% -- are possible with the proper effort and investments.  

Are Microsoft's dreams realistic?

It may seem ridiculous, but investors should know there is precedent for dropping costs of biotechnologies over 99.9%. The amazing success of the Human Genome Project, which was initiated to spur innovation in DNA sequencing ("reading genes"), serves as a great example of what's possible. The cost of sequencing a human genome fell from $3 billion at the start of the project to just $1,000 today. Illumina thinks it can reduce that to $100 in the near future.

The recently announced Genome Write Project, which aims to spur innovation in DNA synthesis and construction ("writing genes"), is the logical follow-up to the Human Genome Project. Catalyzing a similar cost reduction to its scientific predecessor would drop DNA data storage costs from $2 billion to $4 billion per TB today to market-ready prices in the next 15 to 20 years, perhaps much sooner for data-center applications when power consumption and footprint costs are factored in.

However, the Genome Write Project's main problem is a lack of funding: Other than a $250,000 grant from Autodesk, there isn't much funding to speak of, despite interest in DNA data storage for data centers from across the industry. If Microsoft is serious about delivering DNA data storage technology to the market, it may want to consider funding the public research project in addition to its in-house R&D. Otherwise, it may be difficult to drum up support from tech peers -- who represent future synthetic DNA demand.