ECE Ph.D. student named Machine Learning Commons Rising Star for breakthrough GPU research

10/24/2025 Cassandra Smith

University of Illinois Urbana-Champaign Ph.D. student Archit Patke has been named a 2025 Machine Learning Commons Rising Star for his groundbreaking work improving the performance and reliability of GPUs that power today's AI systems. 

Written by Cassandra Smith

Illinois Ph.D. student Archit Patke, currently a student in Prof. Ravishankar K Iyer group, has been named a 2025 Machine Learning Commons Rising Star for his groundbreaking work improving the performance and reliability of GPUs that power today’s AI systems. His research boosts efficiency by teaching machines to work smarter -- not harder -- while uncovering the causes of GPU failures in advanced hardware like NVIDIA’s A100 and H100. Already recognized by major tech companies and adopted in open-source platforms, Patke’s achievements highlight Illinois’ leadership in next-generation AI innovation.

Photo of Archit Patke
Archit Patke

Machine Learning Commons is comprised of many universities and companies including Google, Meta and Nvidia. “They come together to identify a bunch of top Ph.D. students in machine learning systems.” ML Commons’ website says, “Through our collective engineering efforts with industry and academia, we continually measure and improve the accuracy, safety, speed and efficiency of AI technologies—helping companies and universities around the world build better AI systems that will benefit society.”

The students picked to receive this honor are truly extraordinary. Patke earned his spot on the list through exceptional work, such as his GPU findings. Iyer said Patke’s work would allow for fewer GPUs needed in a system. “The improved multi model performance would allow for 40% more work to get done on 100 GPUs than with current methods” said Iyer, who is a researcher in the Coordinated Science Lab. According to Iyer, NCSA’s contributions to this project were significant, particularly Greg Bauer and Brett Bode, who provided comprehensive access to NCSA’s advanced GPU nodes and valuable data that enabled this project to get off the ground. Phuong Cao, a specialist in the integrated cyberinfrastructure (ICI) team, provided critical domain knowledge and with initial support of Bill Kramer, curated operational data from the Delta and DeltaAI systems.

Patke’s work focuses on strategically placing workloads to fill gaps and avoid idle time, thereby improving efficiency. “Our idea was that not all requests have the same kind of latency requirements,” he said. “So, there’s an opportunity to merge multiple models and multiple request types so we can improve the overall system efficiency.” Jointly supported by the National Science Foundation, IBM, Meta and Google, his work not only addresses scheduling improvements, but it also looks at trustworthiness with Phuong Cao, a cybersecurity researcher at NCSA, leading IBM researchers at the IBM-Illinois Discovery Accelerator Institute, particularly Daby Sow, Chandra Naryanaswami, and Saurabh Jha and with fellow SCDS/ECE students Shengkun Cui, Hung Nguyen, Aditya Ranjan, Ziheng Chen.

The team collaborated on a related GPU resiliency study on “why GPUs fail and what we can do about it”. Using data acquired from NCSA’s Delta AI machines, the team looked at the internals of the GPU’s and built intelligent data-driven failure models —and that, as Archit calls it, “went viral.” The team built AI tools to jointly analyze performance and reliability data, including GPU utilization, temperatures, and error rates, of DeltaAI advanced computing system. The highlight of this was to build a new generation of foundation models that are able to predict onset of errors that can significantly impact training of large-scale AI applications. DeltaAI is one of the National Science Foundation (NSF)’s most requested GPU clusters through the ACCESS and NAIRR Pilot program. The team received calls from many companies that were interested in their paper, which will be presented at the International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing) 2025.

“We were one of the first papers to characterize how GPUs fail in AI systems, especially some of the modern GPUs like A100s and H100s,” Patke said. These innovations are what led Patke to be chosen for this Rising Star Award.

The project’s success led to not only Archit’s award, but also two National Science Foundation (NSF) awards in cybersecurity innovation for Phuong Cao and research security for Iyer as well as job opportunities for other students. An example is Hung Nguyen’s internship at Argonne to analyze massive longitudinal log datasets from over 500 nodes of Polaris and 10,000 nodes of the Aurora Exascale Supercomputer.

According to Iyer, “from the beginning, Archit’s innate sense of curiosity and ability to think beyond what seems possible today has been a driving force in his success.” Patke is currently being recruited by major cloud and AI vendors not only because of his highly influential body of work but also the potential for future breakthroughs. 

Patke’s accomplishments speak to his curiosity, persistence, and drive to solve complex problems. The University of Illinois provided the community and tools to help him transform those qualities into groundbreaking results. His recognition as an ML Commons Rising Star shows what happens when individual talent meets the opportunities Illinois makes possible.

[1] ML Common’s Award Link: https://mlcommons.org/2025/06/2025-mlc-rising-stars/

[2] Supercomputing paper abstract and schedule:https://sc25.conference-program.com/presentation/?id=pap425&sess=sess178

[3] Supercomputing paper pdf: https://pmcao.github.io/files/sc25-gpu.pdf 

[4] NSF Award CICI: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2530738&HistoricalAwards=false

[5] Hung Nguyen Argonne:


Ravishankar K. Iyer is the George and Ann Fisher Distinguished Professor of Engineering at the University of Illinois at Urbana-Champaign. He holds joint appointments in the Departments of Electrical and Computer Engineering and Computer Science, in the Coordinated Science Laboratory (CSL), the National Center for Supercomputing Applications, the Carle Illinois College of Medicine, and the Carl R. Woese Institute for Genomic Biology. He is also faculty Research Affiliate at the Mayo Clinic, and Yeoh Ghin Seng Distinguished Visiting Professor of the National University Health System, Singapore.


Share this story

This story was published October 24, 2025.