ECE 542
ECE 542 - Design of Fault-Tolerant Digital Systems
Spring 2016
Title | Rubric | Section | CRN | Type | Hours | Times | Days | Location | Instructor |
---|---|---|---|---|---|---|---|---|---|
Fault-Tolerant Dig Syst Design | CS536 | C | 33992 | LCD | 4 | 0900 - 1020 | M W | 2013 Electrical & Computer Eng Bldg | Zbigniew T Kalbarczyk |
Fault-Tolerant Dig Syst Design | ECE542 | C | 33991 | LCD | 4 | 0900 - 1020 | M W | 2013 Electrical & Computer Eng Bldg | Zbigniew T Kalbarczyk |
See full schedule from Course Explorer
Official Description
Advanced concepts in hardware and software fault tolerance: fault models, coding in computer systems, module and system level fault detection mechanism, reconfiguration techniques in multiprocessor systems and VLSI processor arrays, and software fault tolerance techniques such as recovery blocks, N-version programming, checkpointing, and recovery; survey of practical fault-tolerant systems. Course Information: Same as CS 536. Prerequisite: ECE 411.
Subject Area
- Reliable and Secure Systems
Course Director
Description
Advanced concepts in hardware and software fault tolerance; topics addressed include fault models, coding in computer systems, module and system level fault detection mechanism, reconfiguration techniques in multiprocessor systems and VLSI processor arrays, software fault tolerance techniques such as recovery blocks, N-version programming, checkpointing and recovery; survey of practical fault-tolerant systems.
Notes
Same as CS 536.
Topics
- Introduction to fault-tolerant computing
- Demonstration of error detection and recovery
- Evaluation: hardware and software reliability models
- Experimental evaluation: simulation based, fault-injection, operational
- Fault-tolerant techniques: coding, checkpointing recovery
- Software fault tolerance
- Case studies of reliable system design
- Reliable networked systems
- Security
- Class Projects
Detailed Description and Outline
Topics:
- Introduction to Fault Tolerance and its applications
- error detection in hardware and software
- coding and its applications
- support for error detectiona and recovery in the OS
- Checkpointing and recovery
- Experimental evaluation: simulation based, fault-injection, operational
- Fault-tolerance checkpointing recovery
- Software fault tolerance
- Case studies of reliable system design
- Reliable networked systems
- Security
- Class Projects
Same as CS 536.
Texts
D.K. Pradhan, Fault-Tolerant Computer System Design, Prentice-Hall, 1996.
Collateral Reading:
D. Siewiorek and R. Swarz, Reliable Computer Systems-Design and Evaluation, 2nd ed., Digital Press - Butterworth, 1992.
B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison-Wesley, 1989.
Last updated
1/19/2015by Ravishankar K. Iyer