“Big Data — that’s a hot topic,” said Qifeng Hu, a junior in electrical engineering, referring to a new course that he and other students are finishing this semester, Making Sense of Big Data (ECE 398BD). It’s the first-ever departmental course focused specifically on the topic, and a hot topic it is.
Anymore, everything can be quantified, measured, subdivided, and analyzed. Our chromosomes are packed with billions of unique units of information. Our social media presence can be viewed as a collection of clicks and likes, which, when examined together, reveals something about our preferences and predilections. Global climate patterns can be simulated from millions of atmospheric measurements, aggregated from around the world, taken over the course of weeks or years. All of this means data — vast amounts of data.
It also means that data can stem from diverse lines of inquiry, transcending traditional disciplinary divisions. Even within a single department like ECE ILLINOIS, the applications are myriad.
“We thought that we should use the strengths of the department to show that Big Data can come from many different physical concerns,” said Professor Pramod Viswanath, one of two course coordinators, and seven total professors involved with the course. “It’s really a team effort. It was brainstormed over many colleagues. So that’s key.”
The course is project-oriented and focuses on four discrete applications of Big Data, starting with a section on audio and video analytics, taught by the second course coordinator Associate Professor Minh N Do and Professor Pierre Moulin. Of two projects in this section, one guides the students through designing software that can identify snippets of songs, like the smartphone app Shazam. Start the student’s app, play the song for a few seconds — their library included at least 100 songs — the algorithms analyze the sample, and the identity is revealed.
“The fact that we actually got to build Shazam, it was just a lot of fun,” said Esteban Saavedra, a junior computer engineering major.
“And they can do it in three weeks,” Viswanath said. “That’s the amazing thing.”
A section on genomic data was taught by Professor Olgica Milenkovic and Professor Wen-mei Hwu. For the project, students had a set of about 600 base pairs — those As, Gs, Cs and Ts that compose the classic DNA double helix — which they learned to sequence. This is the fundamental basis for genetic diagnostics and biological systematics, and to rapidly perform these tests in real-world scenarios, the computer systems must be able to handle billions of these data points.
“When you’re dealing with a huge amount of data, you really cannot just process every single line of the information,” Hu said. “You need another method to debug, to detect the errors. ... It’s a pretty new experience.”
Associate Professor Jonathan Makela and Professor Farzad Kamalabadi taught another section, where the students applied Big Data techniques to wave propagation and earthquakes. GPS receiver data from smartphones was used to determine the epicenter of the devastating Tohoku earthquake, which caused the tsunami and nuclear meltdown in Japan in 2011. The GPS data also contained information about simultaneous changes in the upper atmosphere, allowing the students to draw correlations between the earthquake and the atmospheric response.
The last section of the course, taught by Viswanath, focuses on the spread of rumors and news on social media networks. The students design algorithms that can detect the source of the rumor and, using probabilistic models, analyze the time it takes for the rumor to completely spread. Other applications of these models include the localization of infectious diseases.
“It’s a great class to get hands-on experience on actual techniques,” Saavedra said.
“[And] it’s kind of nice to explore the different topics,” said electrical-engineering junior Alyssa Romeo. “Like bioinformatics: I would have never known anything about DNA sequencing without this class. ... It makes me think I want to go into something along this path.”
The class exposes students to a complete cycle for Big Data processing, beginning with data collection. This reveals fundamental limitations to the data — the biases and noise levels stemming from collection — and once the data is represented, analyzed, and visualized, the students can develop plans for recollecting the data. Once comfortable with this cycle, the same methods can be transferred between Big Data applications.
“There is a sandbox for them to come in and play and learn and spend a lot of time working with it,” Do said. “You have to get your feet wet, hands dirty, but after that, you become extremely valuable for later research and industrial work. ... I was very energized when, after two weeks of teaching the class, someone told me that they ... got a job interview, and the potential employer was asking about Big Data.”
Across the engineering campus, there has been significant interest in Big Data, with the $100-million Grainger Engineering Breakthroughs Initiative bringing in a dozen new faculty members with Big Data expertise. As such, this course is a bellwether of more opportunities for students to gain hands-on experience working with Big Data.
“To me that is precisely why they come to Illinois,” Do said of students. “They have a chance to work on cutting-edge research right here. That is a unique experience.”
“If you think about it, with all of the online course development, lab courses take on even more prominence,” Vishwanath added. “That experience of a lab and working on a whole project, that will not get outsourced and cannot be made online.”
For the students, even this first class has been inspiring and eye-opening. “Honestly, I’ve gotten so much out of the class that I would recommend it to anyone,” Saavedra said.
“Yeah, I already did,” Hu said.
“One of my favorites,” Romeo agreed.