10/30/2014 Ashish Valentine, ECE ILLINOIS
Written by Ashish Valentine, ECE ILLINOIS
A team led by ECE Professor Yoram Bresler has won an almost $950,000 grant from the National Science Foundation to develop more efficient algorithms and computational methods for analyzing Big Data and extracting useful information. Bresler is working with ECE Assistant Professor Yihong Wu, Mathematics Professor Marius Junge, and Statistics Visiting Assistant Professor Kiryung Lee.
Bresler’s team is mainly focused on analyzing objects of data called tensors. Tensors are multi-dimensional arrays of data listed by certain categories, or dimensions. Though they can seem abstract, tensors can be observed anywhere we go. For instance, walk out into a park and look at the surface of a pond, and visualize gridlines crisscrossing the pond's surface into the ordered squares of an x-y axis. Each point on the pond can be represented in x-y coordinates.
Create a mathematical function that plots the temperature at any position on the pond, and you'll get a list of coordinates with temperature values, sorted into an array of data with a category for temperature and one for position. Another name for this kind of an array is a tensor, and a tensor with two categories, or dimensions is called an order-2 tensor. But ponds aren't just flat; they're deeper in some areas than others. Create a function to find the temperature throughout the pond, as a function of both position and depth, and you'll get a data structure holding all three dimensions called an order-3 tensor, and if we added a category for temperature changing over a time dimension, the structure would become an order-4 tensor.
Tensors don’t sound immediately intimidating when we’re talking about a pond, but replace a pond with the Atlantic Ocean and you'll get an idea of the difficulty of wrangling with tensors in the field of Big Data. In one of the Bresler team’s planned trials, the team will collaborate with researchers at the Beckman Institute studying the brain by various combinations of neuro-imaging modalities like Functional Magnetic Resonance Imaging, or fMRI, under various conditions of rest, stimulus, and training.
The brain is immensely complicated, containing more neurons than there are stars in our galaxy, and data must be kept on each area of the brain’s activity over thousands of frames of neural imaging footage. The resulting amount of tensor data is enormous, and if one wants to make educated guesses about how thinking processes work, she would need to sift through unbelievable amounts of seemingly random information before making any conclusions at all.
Bresler’s team aims to solve that by developing efficient algorithms and computational tools that dive through uncharted depths of complex data, returning with the information that researchers need. The algorithms accomplish this by systematically going through the data and finding regular patterns or structures, which enable a concise representation of the data, such as similar elements that can be combined.
For example, during Beckman researchers’ brain scans, while different neurons are firing throughout the brain, most of the brain stays silent at any given time. An efficient algorithm knows how to disregard all of this silent data and focus only on the parts that are firing. The brain doesn’t change much from one frame to another, so the algorithm can combine similar data. As a result, it will have less information to analyze.
The term Big Data is a buzzword ubiquitous in technical parlance these days, but the combination of exponential increase in the amounts of data and its complexity on the one hand, and the limitations of current tools and theory on the other hand make the field a hotbed of new research and interest. This is one of the reasons, Bresler said, that the NSF established the Big Data Research Initiative. The grant went to Bresler’s team because, according to him, the reviewers found their proposal compelling and their research impactful.
Work on the precursors to project hasn’t always been smooth sailing. Lee and Wu noted that it has often consisted of weeks, if not months, of headache and frustration before a single eureka moment made it all worth the effort. Lee and Wu had developed a working algorithm to comb through certain tensor data, but the hardest part was coming up with a way to prove mathematically that their algorithm was as efficient as it could possibly be. They do this because, while algorithms already exist to solve some of the problems they’re tackling, they aren’t always efficient enough to be used on huge data sets or by teams without powerful enough computers. By coming up with more efficient algorithms, they could help teams with less funding come to better-informed conclusions. As they learned through experience, coming up with an algorithm to solve a problem is only half the battle.
“Although our algorithm empirically worked very well, we would require more than that to really claim a success,” Wu said. “A gap existed between what seems to work and what you can point to mathematically and really claim this as optimal as is theoretically possible. We have been modifying procedures and analyses countless times trying to make our system provably perfect and, finally we have one you can prove that is optimal.”
Now that the team has won a substantial amount of grant money from the NSF, it plans to hire graduate research assistants to help with the research. These graduate students will benefit from the research in their PhD theses, and will also be needed to work through the proofs that these algorithms require, and to tailor the algorithms to individual context.
For instance, the methods might work beautifully on fMRI data but, when applied to genetic data, a certain amount of tweaking might be needed to make the algorithm work on the new kind of information, and a student with a background in genetics and in big data could help with this while also gaining expertise through hands-on experimentation.
The team is thrilled to have received the grant, Junge said. Lee was actually a former PhD student jointly advised by Bresler and Junge, and brought Junge on board to construct the mathematical foundation work that would determine whether certain algorithms were feasible to develop.
“It’s not every day that your former student seduces you into this kind of groundbreaking research,” Junge said. “Even though Kiryung was my student, I ended up learning much more from him than he could have possibly learned from me.”
The enthusiasm of these professors for this project comes through as much more than just professional interest. For Bresler, tensors aren’t just abstract constructions of the mind, but living, breathing objects to be found throughout the physical world.
“Look around the room,” he said, gesturing around his sunlit office. “3-D data is all around you. The position of arbitrary points in the air around you can be represented in x, y, and z-coordinates. If you took the air temperature at each point in the room, that would add a dimension to the tensor around us. Each arbitrary point in the air around us has a temperature, humidity, a ratio of gas elements composing it, and so we see that this data surrounds every aspect of our lives, even if we don’t normally think about it that way.”