Milenkovic looks for Big Data storage solution in DNA
While the cost of data storage has gone down in recent years, the difficulty of finding a storage medium that is nonvolatile, durable and large enough to meet today’s Big Data needs has remained.
To address the problem, Associate Professor Olgica Milenkovic is looking to use DNA as a way to store Big Data, in an attempt to replace today’s traditional devices, such as flash drive memories, hard disks and magnetic recording devices. DNA has proven to be durable and can achieve incredibly high storage densities; further, according to a recent paper from the Birney and Goldman Labs at the European Bioinformatics Institute, a single gram of DNA has the ability to store up to two terabytes.
“We are still finding bones from over 10,000 years ago from which we can extract DNA,” Milenkovic said. “No other storage medium is that durable.”
Together with fellow Illinois faculty Jian Ma of Bioengineering and Huimin Zhao of Chemistry, Milenkovic will develop models for coding big data onto DNA. To support this effort, she was recently awarded a three-year, $384,000 grant by the CIA for a project titled “Coding Techniques for Rewritable DNA-Based Storage.” Milenkovic's collaboration with Ma and Zhao has previously been funded through a Strategic Research Initiative award.
While methods exist for archiving information in DNA, there is no way to do selective data retrieval and data rewriting, which are needed in both archival and frequent read-out (data that is frequently accessed) modes. Milenkovic is working with her colleagues in Bioengineering and Chemistry, along with ECE graduate student Hussein Tabatabaei Yazdi and postdocs Yongbo Yuan, Hanmao Kia, and Greg Pulleo, to develop a novel DNA storage architecture that would allow for random access and in where the stored information would be rewritable.
“We are interested in going beyond the straightforward archival approach and allowing for random access of the information,” Milenkovic said. “The solution to the problem relies on a new coding technique that caters to the properties of the DNA storage media.”
Milenkovic’s approach also tackles protection of data against sequencing errors, ensuring that errors in stored messages are kept to a minimum. The earlier SRI-sponsored project on DNA-based storage produced a prototype solution that is currently being tested in the Ma and Zhao labs.
To reflect the CIA’s interests in data storage, Milenkovic suggests a hybrid DNA-based storage solution in which the rarely edited bulk of the data is stored in DNA, whereas smaller amounts of data that need to be edited more often are stored via classical methods. The solution protects against potential errors that naturally arise in large-scale DNA storage, while also being less costly. Using combinatorial coding theory, Milenkovic and her group will work to develop coding schemes to accomplish this hybrid storage solution via DNA.
“We need to find the best coding schemes that facilitate ease of access and help eliminate reading and writing errors,” she said. “DNA as a storage mechanism is so different and the medium is nontraditional, so you have to come up with new models and use that model to develop new coding schemes.”
While other DNA-based applications, such as DNA computing, haven’t grown as rapidly as some people predicted, Milenkovic says she sees great potential in the use of DNA as a storage medium, but that it will take time.
“It’s a slowly developing technology, which is why people have previously mainly been interested in archival storage,” she said. “We’re still not there in terms of time and efficiency of processing, but we are making surprisingly steady progress.”
And while the pursuit of DNA data storage is new and still a costly endeavor, Milenkovic sees a future in which videos, images and other data types could be stored in a device that weighs only a few grams and is available for personal use.
“All the traditional storage companies are looking into this as an option now because of today’s big data,” Milenkovic said. “We have good storage options that are cheap right now, but we’re getting more and more data, so we need better alternatives."