Final Exam - Data-efficient and Fault-tolerant Exascale Computing

Final Exam - Data-efficient and Fault-tolerant Exascale Computing promotional image

PhD Candidate: Yafan Huang

Abstract: Modern high-performance computing (HPC) systems operate at massive scales, comprising thousands of nodes equipped with high-end CPUs and GPUs to support complex workloads such as large language model training, quantum simulation, and high-resolution scientific simulations. As these systems continue to scale, two major challenges identified by the U.S. Department of Energy (DOE) become increasingly critical: managing the growing volume of data and ensuring robust error resilience.

My PhD research addresses both challenges by developing flexible, efficient, and broadly applicable software solutions. On the data-efficiency side, I design ultra-fast GPU-based compression frameworks, such as cuSZp, that achieve high compression ratios while preserving data fidelity for diverse applications. On the reliability side, I develop low-overhead fault-tolerance techniques that enable effective detection of complex faults with minimal performance impact. Together, these contributions provide scalable software solutions that improve data efficiency and reliability in next-generation HPC and AI systems.

Advisor: Guanpeng Li

Location: Zoom: https://uiowa.zoom.us/j/5812155189. Please contact Yafan, yafan-huang@uiowa.edu, if you plan to attend.

Thursday, April 16, 2026 3:00pm to 4:00pm
View on Event Calendar
Individuals with disabilities are encouraged to attend all University of Iowa–sponsored events. If you are a person with a disability who requires a reasonable accommodation in order to participate in this program, please contact Tina Kimbrell in advance at 319-335-1793 or tina-kimbrell@uiowa.edu.