MJ - Very good point! A quaternary bit scheme (A=0, T=1, G=2, C=3) is more efficient in theory than the redundant scheme in this paper; the four-value bit has the advantage of greater data density, as other folks have demonstrated on smaller scales.
But as I gathered form George Church, that approach has drawbacks as well. Homopolymers -- long strings of the same letter, like TTTTTTTTT -- are notoriously difficult to sequence accurately. But with two letters for 0, and two letters for 1, Church's team could design an algorithm that avoids creating homopolymers.
The homopolymer problem may be temporary, and as sequencing technology continues to improve, a quarternary bit scheme may become more appealing.
Thanks for reading!