C3SR collaboration leverages data to integrate machine learning into our daily lives
Deb Aronson, ECE ILLINOIS
5/25/2018 1:11:53 PM
Ever since IBM’s Watson beat Jeopardy! champions Ken Jennings and Brad Rutter in 2011, the stage has been set for a renaissance of artificial intelligence.
“With Jeopardy we showed that a computing system that learns from unstructured natural languages could beat the best humans,” says Dr. Jinjun Xiong, IBM’s director of C3SR.
AI is when computers can make predictions and inferences based on data — natural languages, images and the like — that it has “learned.” This is known as machine learning. In these scenarios, computers will be able to not simply complete tasks, but to predict and advise humans, by analyzing vast data sets and recognizing patterns within them.
At its simplest, it’s when Amazon tells you you’d like a certain book based on your past book-buying history. But at its most complex, cognitive computing (AI) has the potential to help doctors treat patients, banks to detect frauds and manage risks, and help teachers and students learn more effectively and feel more engaged.
Successful AI requires enormous data sets. Thanks to both exponentially growing data sets (largely from social media and crowdsourcing) and increasing computing power, the time is ripe for AI.
C3SR is an opportunity for Illinois to play to its strengths; Illinois has a long history of building big systems, from ILIAC to Blue Waters, as well as in developing compiler technologies and systems. IBM is counting on Illinois’s expertise, not only in software development but also in hardware and the ability to integrate the two.
“IBM is always looking for opportunities to advance the state-of-the-art computing through open collaboration,” Xiong says, in explaining the choice of Illinois. “They have expertise across the board and still have a focus on system integration. … You have to have both those capabilities and also the willingness to knock down barriers.”
The “motivating cognitive application,” as PhD student Carl William Pearson puts it, is called the Creative Experiential Learning Advisor, or CELA. The challenge is to harness all the data on the internet to help teachers and students with science-related topics.
For example, imagine a teacher has a hard time finding just the right project for their classroom. If a project like CELA could create a database of STEM education curriculum goals as well as all science experiments, organized by grade or ability, and map those curriculum goals to existing projects, teachers and students alike would benefit. That program also could take into account the students’ backgrounds and specific needs. The system also could make use of videos, which would help students stay engaged.
Illinois PhD students Hongyu Gong and Tarek J Sakakini, among others, have already made great inroads. Their work has involved defining scientific concepts, gathering a corpus of scientific experiments and then mapping between the concepts and the projects. Gong, together with her Illinois advisor, Suma Bhat, designed an algorithm specific to the nature of text in scientific concepts and scientific projects. The newly designed algorithm achieves the mapping at a performance exceeding existing algorithms.
“Hearing the stories of the IBM researchers was inspirational,” Sakakini says. There also were ample opportunities to socialize outside of work and have informal conversations.
“It really helped me see how research impacts industry and the market,” said Sakakini. “We got exposure and also feedback,” he adds.
Indeed PhD student Cheng Li, after successfully completing a project on Matrix Factorization on GPU while at IBM, was encouraged by both Xiong and Hwu to find a new project. Together with fellow PhD student Abdul Majed Dakkak, Li developed ML ModelScope, an open source distributed platform to help developers in “model experimentation, deployment and evaluation across hardware infrastructures.” This project is so successful it has become the flagship project in C3SR. Li and Dakkak have demonstrated it at IBM booths in major conferences, including the Annual Conference on Neural Information Processing Systems (NIPS), Consumer Electronics Show (CES), and Association for the Advancement of Artificial Intelligence conference (AAAI). This is now Li’s major research project and she continues to work closely with Xiong and others at IBM.
“There was an expert in every field, and entire building of experts … and we were treated as colleagues. Everyone was very supportive and we had lots of freedom,” says Pearson. “I feel like what I’m doing is more real.”
Also, because the C3SR project is so broad in scope, these students have the opportunity to learn from people working in many different domains. And their input also was greatly appreciated by IBM researchers. Pearson says that C3SR also has had a big impact on campus because the graduate students have become more engaged with the undergraduates in the course of the C3SR projects.
“We are all better for it,” says Pearson of the interactions between IBM and ECE.
Enabling traffic cameras to catch the smallest details
Traffic cameras are among the largest generators of vast data sets. The goal was for the computer to be able to distinguish between pedestrians, stationary objects, various types of cars, from small cars to trucks, and to discern the time of day and the color of the traffic light. This is a very difficult object because there are a large number of images, several hundred thousand and each team had to come up with a system to process and identify all activities and objects.
The first part of the challenge was to annotate almost 1.5 million objects from more than 80 hours of video. The second phase involved building and activating a model to effectively and efficiently track objects from the videos.
These kinds of challenges, including the DARPA self-driving car challenge, provide an efficient way to move a real-world problem forward in a way that grant making organizations do not, says Hwu.