ECE 542 - Design of Fault-Tolerant Digital Systems

Spring 2024

TitleRubricSectionCRNTypeHoursTimesDaysLocationInstructor
Fault-Tolerant Dig Syst DesignCS536C33992LCD41230 - 1350 T R  2013 Electrical & Computer Eng Bldg Ravishankar K Iyer
Fault-Tolerant Dig Syst DesignECE542C33991LCD41230 - 1350 T R  2013 Electrical & Computer Eng Bldg Ravishankar K Iyer

Official Description

Advanced concepts in hardware and software fault tolerance: fault models, coding in computer systems, module and system level fault detection mechanism, reconfiguration techniques in multiprocessor systems and VLSI processor arrays, and software fault tolerance techniques such as recovery blocks, N-version programming, checkpointing, and recovery; survey of practical fault-tolerant systems. Course Information: Same as CS 536. Prerequisite: ECE 411.

Subject Area

  • Reliable and Secure Systems

Course Director

Description

Advanced concepts in hardware and software fault tolerance; topics addressed include fault models, coding in computer systems, module and system level fault detection mechanism, reconfiguration techniques in multiprocessor systems and VLSI processor arrays, software fault tolerance techniques such as recovery blocks, N-version programming, checkpointing and recovery; survey of practical fault-tolerant systems.

Notes

Same as CS 536.

Topics

  • Introduction to fault-tolerant computing
  • Demonstration of error detection and recovery
  • Evaluation: hardware and software reliability models
  • Experimental evaluation: simulation based, fault-injection, operational
  • Fault-tolerant techniques: coding, checkpointing recovery
  • Software fault tolerance
  • Case studies of reliable system design
  • Reliable networked systems
  • Security
  • Class Projects

Detailed Description and Outline

Topics:

  • Introduction to Fault Tolerance and its applications
  • error detection in hardware and software
  • coding and its applications
  • support for error detectiona and recovery in the OS
  • Checkpointing and recovery
  • Experimental evaluation: simulation based, fault-injection, operational
  • Fault-tolerance checkpointing recovery
  • Software fault tolerance
  • Case studies of reliable system design
  • Reliable networked systems
  • Security
  • Class Projects

Same as CS 536.

Texts

D.K. Pradhan, Fault-Tolerant Computer System Design, Prentice-Hall, 1996.

Collateral Reading:

D. Siewiorek and R. Swarz, Reliable Computer Systems-Design and Evaluation, 2nd ed., Digital Press - Butterworth, 1992.
B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison-Wesley, 1989.

Last updated

1/19/2015by Ravishankar K. Iyer