- Departments
- Medical Education
- Vanderbilt Hall
- Admissions
- Financial Aid
- Office of the Registrar
- Campus Planning and Facilities
- Ombuds Office
- Committee on Microbiological Safety
- Human Resources
- Office for Academic and Clinical Affairs
- Joint Committee on the Status of Women
- Finance
- The Academy
- Global Health Research Core
- @HMS
- Global Clinical Scholars Research Training Program
- HMA Standing Committee on Animals
- Office of Research Compliance
- Global & Community Health
- Harvard Medical School Event Calendar
- 2010
- 2011
- 2012
- Biography
- Contact @HMS
- Office of Diversity RIA Program
- Q&A Archive
- Research
- Talks@12
- The Dean's Perspective
- Videos
- Harvard Mahoney Neuroscience Institute
- Human Resources
- Calendar
- Contact us
- Intranet
- Dental Medicine
- Harvard University
News
Writing the Book in DNA
August 16, 2012
Although George Church’s next book doesn’t hit the shelves until Oct. 2, it has already passed an enviable benchmark: 70 billion copies—roughly triple the sum of the top 100 books of all time.
And they fit on your thumbnail.
That’s because Church, the Robert Winthrop Professor of Genetics at Harvard Medical School and a founding core faculty member of the Wyss Institute for Biomedical Engineering at Harvard University, and his team encoded the book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves, in DNA, which they then read and copied.
Biology’s databank, DNA has long tantalized researchers with its potential as a storage medium: fantastically dense, stable, energy efficient and proven to work over a timespan of some 3.5 billion years. While not the first project to demonstrate the potential of DNA storage, Church’s team married next-generation sequencing technology with a novel strategy to encode 1,000 times the largest amount of data previously stored in DNA.
The team reports its results in the Aug. 17 issue of the journal Science.
The researchers used binary code to preserve the text, images and formatting of the book. While the scale is roughly what a 5 ¼-inch floppy disk once held, the density of the bits is nearly off the charts: 5.5 petabits, or 1 million gigabits, per cubic millimeter. “The information density and scale compare favorably with other experimental storage methods from biology and physics,” said Sri Kosuri, a senior scientist at the Wyss Institute and senior author on the paper. The team also included Yuan Gao, a former Wyss postdoc who is now an associate professor of biomedical engineering at Johns Hopkins University.
And where some experimental media—like quantum holography—require incredibly cold temperatures and tremendous energy, DNA is stable at room temperature. “You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later,” Church said.
Reading and writing in DNA is slower than in other media, however, which makes it better suited for archival storage of massive amounts of data, rather than for quick retrieval or data processing. “Imagine that you had really cheap video recorders everywhere,” Church said. “Just paint walls with video recorders. And for the most part they just record and no one ever goes to them. But if something really good or really bad happens you want to go and scrape the wall and see what you got. So something that’s molecular is so much more energy efficient and compact that you can consider applications that were impossible before.”
About four grams of DNA theoretically could store the digital data humankind creates in one year.
Although other projects have encoded data in the DNA of living bacteria, the Church team used commercial DNA microchips to create standalone DNA. “We purposefully avoided living cells,” Church said. “In an organism, your message is a tiny fraction of the whole cell, so there’s a lot of wasted space. But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn’t earn its keep, if it isn’t evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it.”
In another departure, the team rejected so-called “shotgun sequencing,” which reassembles long DNA sequences by identifying overlaps in short strands. Instead, they took their cue from information technology, and encoded the book in 96-bit data blocks, each with a 19-bit address to guide reassembly. Including jpeg images and HTML formatting, the code for the book required 54,898 of these data blocks, each a unique DNA sequence. “We wanted to illustrate how the modern world is really full of zeroes and ones, not As through Zs alone,” Kosuri said.
The team discussed including a DNA copy with each print edition of Regenesis. But in the book, Church and his co-author, the science writer Ed Regis, argue for careful supervision of synthetic biology and the policing of its products and tools. Practicing what they preach, the authors decided against a DNA insert—at least until there has been far more discussion of the safety, security and ethics of using DNA this way. “Maybe the next book,” Church said.
This work was supported by the U.S. Office of Naval Research (N000141010144), Agilent Technologies and the Wyss Institute.
Comments
Comments
1.
17 Aug 2012
08:23 am
Amazing work, congrats ..
I wonder if the existing DNA's of the living things, humans, plants, etc. already have a text or some sort of information embedded other than the definition of the chemical and physical qualities of the body. Did anybody worked on deciphering a DNA in that sense .? Do we have a reference manual of us and everything else in the universe, stored in our DNA's ??
2.
17 Aug 2012
04:01 pm
One question regarding this procedure,
If the storage method for this technique is A or T = 0 etc., Why not use a different method to double the storage capacity?
A=00
C=01
G=10
T=11?
3.
17 Aug 2012
05:29 pm
I wish I could read.
4.
19 Aug 2012
12:57 pm
Althought the 'work was supported by the U.S. Office of Naval Research' (meaning our tax dollars) I bet there is no URL where you can freely read the entire text. If there is please post it here.
5.
21 Aug 2012
09:17 am
Excellent, specialy for people like me who believe in the Science.
6.
21 Aug 2012
12:50 pm
w00t, I wondering what is the speed of write and read.
7.
21 Aug 2012
01:33 pm
MJ - Very good point! A quaternary bit scheme (A=0, T=1, G=2, C=3) is more efficient in theory than the redundant scheme in this paper; the four-value bit has the advantage of greater data density, as other folks have demonstrated on smaller scales.
But as I gathered form George Church, that approach has drawbacks as well. Homopolymers -- long strings of the same letter, like TTTTTTTTT -- are notoriously difficult to sequence accurately. But with two letters for 0, and two letters for 1, Church's team could design an algorithm that avoids creating homopolymers.
The homopolymer problem may be temporary, and as sequencing technology continues to improve, a quarternary bit scheme may become more appealing.
Thanks for reading!
-A
8.
21 Aug 2012
01:51 pm
taxpayer - There's a big debate on this one. Under NIH rules, most research funded on your dime (and mine) must be freely available within a year of publication. On one side of the debate, open-access advocates (and others) say that's not fast enough; on the other side, publishers (and others) say that's too onerous. Within the last two years, competing bills have been introduced in Congress to either shorten the wait, or to scrap free access altogether. None have yet passed.
But to your point, there is no URL where you can freely read the entire text. You can, however, see the supplementary materials here: http://goo.gl/wfW8i.
9.
21 Aug 2012
10:02 pm
Very cool, can't wait to see this technology in production, you could store everything for ever, imagine historians in 500 years time being able to see everything that happened in 2012.
10.
25 Aug 2012
09:45 am
where can i find the full mechanism ?
11.
26 Aug 2012
07:24 am
I'm also writing in response to MJ's suggestion. If it were to be the way you suggested, then reading on one template would certainly differ from the other. But if A and T codes for the same thing, then we can always take any of the strand and it will give us the real code. I wouldn't know how much to explain this but it would actually bring out the real double strands essence. That's to my lil knowledge.
12.
27 Aug 2012
08:42 am
i'm with MJ Welland and R. Alan Leo.
why not use base4 to encode?
you can (much like a amino acid table) have a sequence for 'start' at each end. doesn't matter how long it is really...just a 'tag'
CTAGCTAG would be enough wouldn't it?
13.
27 Aug 2012
10:53 am
Rasit Serdengecti - I created a web program with which you can input real DNA code and it will return what text may be in it. You can also translate text into DNA code similar to how Mr. Church and his colleagues are doing it.
Here is that website:
http://dulbrich.is2.byuh.edu/dna/
14.
05 Sep 2012
09:18 am
Is that 96 bits plus 19 address bits per block? Or including 19 address bits?
15.
10 Sep 2012
04:34 am
I suppose if your DNA got wet it could be attacked by bacteria and fungi, yes? Or can it be protected in a non-degradable(?) sheath and still remain readable?
16.
12 Sep 2012
10:47 pm
In 2009 i was in class 12 ,I told my brother we both make a live anti-virus based on DNA for computar my brother laugh and today its all going happn
17.
18 Sep 2012
02:27 pm
Really excellent work. Engineering and Biotechnology must go hand in hand now.
18.
21 Sep 2012
04:45 am
1-what real advantage compare with a bi-molecule chain such as ch2-ch2-no-ch2...coding ?
2-can you imagine to integrate the coded dna fragment book into a human dna with possible transmission from father to son ?
19.
03 Oct 2012
01:00 pm
the value we are assigning to ATCG that also has to be stored somewhere , what about taht.
20.
29 Dec 2012
04:08 pm
Very interesting. Congratulations. My question is : How fast is the encoding/decoding process in comparison with actual storage technologies?

