Published in 2008, "Program Optimization Space Pruning for a Multithreaded GPU" proposed a simple methodology for significantly diminishing the workload involved in the optimization process.
Originally published in the CGO proceedings in April 2008, the paper was co-authored by several students, now alumni, including Shane Ryoo (BSEE '00, MSEE '04, PhD '08), Christopher Rodrigues (MS '08, PhD '14), Samuel S. Stone (MS '07), Sara Baghsorkhi (PhD '11), Sain-Zee Ueng (MSEE '04), and John A. Stratton (BSCompE '06, MS '09, PhD '13). The researchers revealed the complexity involved in optimizing applications for highly-parallel systems and introduced one relatively simple methodology for reducing the workload involved in the optimization process by as much as 98 percent. Their work was based on the GeForce 8800 GTX using CUDA, one such highly-parallel system, and they proposed an approach for attacking the complexity of optimizing code by developing metric to judge the performance of an optimization configuration.
With the rise of inexpensive, single-chip, massively parallel platforms, more developers will be creating highly-parallel applications for platforms which will need to be optimized, emphasizing why Hwu's research is so valuable. Hwu is also affiliated with the Coordinated Science Lab at Illinois.