Speeding up 3-D video for computers

2/18/2014 Jonathan Damery, ECE ILLINOIS

ECE graduate student Jungwook Choi and Computer Science Department Head Rob Rutenbar have demonstrated one of the fastest implementations of 3-D computer vision.

Written by Jonathan Damery, ECE ILLINOIS

Most mammals have binocular vision. It helps squirrels, in the trees on campus, determine the distance between branches and make the leap. It helps basketball players toss buzzer-beaters from half court.

For computers, the same stereoscopic vision—taken with two or more offset cameras—can provide equally valuable 3-D information. Even static images can help with object recognition or, in the case of Google’s aerial maps, with creating 3-D cityscapes, complete with topography and correctly proportioned and shaded trees.

Now, the rate at which computers can extract that 3-D information is speeding up. ECE graduate student Jungwook Choi and Computer Science Department Head Robin A Rutenbar—also an Abel Bliss Professor—have demonstrated one of the fastest video-rate implementations of this 3-D computer vision.

Jungwook Choi presenting at MEMOCODE last fall. Photo courtesy of MEMOCODE.
Jungwook Choi presenting at MEMOCODE last fall. Photo courtesy of MEMOCODE.
Last fall, their design earned them top honors for the best accuracy-adjusted performance at the MEMOCODE design competition, held in Portland by IEEE and the Association for Computing Machinery (ACM). 

With video-rate stereo matching, computers could recognize gestures more readily, and the technology could play an important role in the move toward driverless vehicles. Already automakers like Mercedes-Benz and Volvo have added pedestrian detection to some models, where stereo images, coupled with radar, are used to warn the driver of nearing pedestrians and—if necessary—apply the brakes.

“In such a case, speed of stereo matching is critical,” Choi said. “The faster stereo matching is done, the more chance the car can avoid the collision.”

In general though, Choi indicated that video-rate stereo matching, while highly important, is just one piece of a larger puzzle. The whole picture—the focus of his overall research—is developing customizable hardware that allows computers to interpret observations more quickly.

To do this, Choi and Rutenbar utilized a type of algorithm known as belief propagation, which, in the case of stereo matching, establishes probable guesses about the spatial depth of pixels in an image. Belief propagation is also widely used in artificial intelligence. Speech recognition, for example, often uses some form of belief propagation when choosing between homophones, interpreting accents, and so forth.

“Belief propagation methods have been researched intensively [over the past decade] and achieved huge success in practice,” Choi said. “But still, there has been a missing step between algorithmic solutions and their realization in the real world applications…mainly due to slow speed.”

Often there’s a trade-off between speed and accuracy, but Choi and Rutenbar were able to achieve both. They employed a belief propagation algorithm known as sequential tree-reweighted inference (TRW-S), which, reportedly, had never been demonstrated at video rates. These algorithms traditionally begin in one section of an image and, as the name implies, move sequentially, pixel by pixel, through the rest. It’s an inherently slow but reliable process. 

To achieve video rates, the team turned to customizable hardware.

“Jungwork devised some very clever architectural tricks to expose lots of useful parallelism,” Rutenbar said.

Rob A. Rutenbar. Photo by L. Brian Stauffer.
Rob A. Rutenbar. Photo by L. Brian Stauffer.
“We can be doing lots of work on different parts of the image concurrently.”

Their experimental results achieve a rate of 12 frames per second, which is significantly faster than other belief-propagation approaches, demonstrated recently.

The team used a Convey HC-1 computer system, which includes customizable integrated circuits known as field-programmable gate arrays. “Stereo matching requires a huge amount of computation memory bandwidth,” Choi explained. “That’s why people have tried to implement stereo matching algorithms on multi-cores…or graphic processors for real time execution, but they are fundamentally restricted in the way of allocating computing power and memory bandwidth.”

Instead, using the field-programmable gate arrays, Choi could fully optimize the system.

Part of the team’s success, therefore, depended on this interdisciplinary approach: the algorithms and machine learning expertise came from the realm computer science, while the hardware customization stemmed from electrical engineering. 

Now, as these algorithms and hardware implementations continue to improve, there’s no doubt that consumers will begin to enjoy the benefits. Already, their system could be translated into real-world applications like that pedestrian detection system.

“One could take the hardware designs into a more custom silicon form, reduce cost and power, and make something with real practical relevance, in a pretty straightforward way,” Rutenbar said. The only question now is just how fast that will happen. 


Share this story

This story was published February 18, 2014.