Metagenomic research studies effects of microbial populations


Katie Carr, Coordinated Science Laboratory

The medical research community is experiencing a growing realization that health is highly dependent on the balance of the microbial population inhabiting the human body, particularly the intestines. Intestinal microbes can strongly influence conditions as varied as obesity and food allergies.


ECE professors Olgica Milenkovic and Venugopal Varadachari Veeravalli have taken a personal interest in this issue and are using their expertise in the analysis and design of algorithms for metagenomics to research what kind of microbial communities exist in our bodies and in our environment. 


ECE faculty members Olgica Milenkovic and Venu Veeravalli are using metagenomics to determine microbial populations in various environments.
ECE faculty members Olgica Milenkovic and Venu Veeravalli are using metagenomics to determine microbial populations in various environments.
Metagenomics is an emerging field of molecular biology concerned with determining the microbial population of an environment. 


“We want to know about what happens in situations when there are so many microbes in one given location, such as the human gut,” Milenkovic said. “Can you find out through DNA sequencing experiments how they interact, how they coevolve and what effect it has on your body?”


In addition to microbial populations in the human gut, Milenkovic and Veeravalli, along with graduate students Minji Kim and Jonathan Ligo, are applying metagenomics to other areas, such as analyzing samples from coal mines. By using high-throughput screening experiments on coal slices, they can discover what organisms exist there. Knowing that information could be helpful, for example, in discovering causes of methane explosions, as methane is involved in bacterial metabolism.


“You can learn a lot about any sort of organism by looking at its genomics,” Ligo said. “There may be things you discover about people’s gut bacteria, which allows you to infer properties about diseases. There are medical applications, as well as environmental applications, such as determining where certain organisms live to get a better idea of the biology.”


Olgica Milenkovic
Olgica Milenkovic
One emerging problem in this area is that of sequencing and determining the genomes of each bacterium individually to figure out what strains are present in a particular location. The difficulty lies in the fact that researchers typically cannot separate out individual microbes efficiently and sometimes have access only to fragments of the bacteria’s genomes.


“The task is to see if you can reassemble the genomes that are present or line them up to references when you’re only looking at fragments of DNA sequences,” Milenkovic said.


In addition, because of the extremely large amounts of data gathered when analyzing an environment such as the human gut, it is incredibly hard to store, process, analyze, assemble, and transmit metagenomic data. The most powerful computers would be needed to perform even the most basic operation.


“It’s a difficult problem because the amount of information we’re collecting is a lot more than we can process and is growing at a rate faster than we can keep up processing,” Ligo said.


Kim and Ligo have developed MCUIUC (Metagenomic Compression at UIUC), the first known specialized approach for metagenomic read compression that will compress metagenomic samples. Additionally, instead of purchasing an expensive and powerful computer or cluster, the pair are working to implement their algorithm by creating parallel tasks that would run on computers that are closer to commodity hardware.


Venugopal Varadachari Veeravalli
Venugopal Varadachari Veeravalli
The team initially tested their system on synthetic data in order to control the samples under ideal circumstances, but they’re now running it on large data sets involving real samples from the human gut and a coal mine.


“Synthetic data is smaller and cleaner than real data,” Ligo said. “In real data, very small changes or amounts of noise can lead to significant results that you need to care a lot about and the boundaries between the details you’re looking at become a lot more fine.”


While this research hasn’t been funded yet, the team already has promising results, such as the ability to compress files up to 1/20th of the original file size on synthetic data.


“We are taking our time with our algorithms because we want to make them work really well on big data sets,” Milenkovic said. “It may take a lot of time to create a method so you can do the assembly on less powerful computers. We’re trying to make it work on computers that ordinary labs could afford.”


The group looks forward to expanding the project by bringing together researchers with different areas of expertise, such as Illinois Bioengineering Professor Jian Ma, Animal Sciences Professor Bryan White, ECE Professor Wen-mei Hwu and ECE graduate student Xiao-Long Wu.


Milenkovic, Veeravalli and their team presented a paper at the Information Theory Workshop in September and plan to present another at Global SIP in December. They’re also working to put together two biological journal papers to present to the bioinformatics community, in the hopes of getting more people interested in this developing research area.


“The nice thing about this project is that it fits well into the theme of The Grainger Foundation $100 million pledge and fits well with big data research,” Milenkovic said. “On top of that, it’s also biology-related, but involves engineering through compression algorithms and information theory. This research combines all of those areas in a really neat way.”


Milenkovic and Veeravalli are also faculty members at the Coordinated Science Laboratory.