Recent News
Partnering for success: Computer Science students represent UNM in NASA and Supercomputing Competitions
December 11, 2024
New associate dean interested in helping students realize their potential
August 6, 2024
Hand and Machine Lab researchers showcase work at Hawaii conference
June 13, 2024
Two from School of Engineering to receive local 40 Under 40 awards
April 18, 2024
News Archives
[Colloquium] Fault-Tolerance for Extreme Scale Systems-A Systems Level Perspective
May 2, 2013
Watch Colloquium:
AVI file (910 MB)
- Date: Thursday, May 2, 2013
- Time: 11:00 am — 12:30 pm
- Place: Mechanical Engineering 218
Kurt Ferreira
Sandia National Laboratories
Achieving the next three orders of magnitude performance increase to move from petascale to exascale computing will require significant advancements in several fundamental areas. Recent reports from the U.S. Department of Energy place resilience as as one of these challenges. This resilience challenge is cross cutting and will likely require advancements in multiple layers in the systems software stack of these extreme-scale systems, from the OS to the application. In this, I will summarize current work at Sandia National Laboratories to address this important challenge. I will characterize this challenge in the context of extreme-scale capability computing, outline current approaches and their benefits, and point out unexplored areas where more work is needed.
Bio: Kurt Ferreira A senior member of Sandia’s technical staff, Kurt Ferreira is an expert on system software and resilience/fault-tolerance methods for large-scale, massively parallel, distributed-memory, scientific computing systems. Kurt has designed and developed many innovative, high-performance, and resilient implementations of low-level system software for a number of HPC platforms at Sandia National Laboratories. His research interests include the design and construction of operating systems for massively parallel processing machines and innovative application- and system-level fault-tolerance mechanisms for HPC.