ECE 408

ECE 408 - Applied Parallel Programming

Spring 2025

TitleRubricSectionCRNTypeHoursTimesDaysLocationInstructor
Applied Parallel ProgrammingCS483AB56564LAB0 -    
Applied Parallel ProgrammingCS483AL56562LEC40930 - 1050 T R  1002 Electrical & Computer Eng Bldg Volodymyr Kindratenko
Applied Parallel ProgrammingCS483OLB68235OLB0 -    Volodymyr Kindratenko
Applied Parallel ProgrammingCS483OLC68236OLC40930 - 1050 T R    Volodymyr Kindratenko
Applied Parallel ProgrammingCSE408AB67939LAB0 -    
Applied Parallel ProgrammingCSE408AL67940LEC40930 - 1050 T R  1002 Electrical & Computer Eng Bldg Volodymyr Kindratenko
Applied Parallel ProgrammingCSE408OLB68237OLB0 -    Volodymyr Kindratenko
Applied Parallel ProgrammingCSE408OLC68238OLC40930 - 1050 T R    Volodymyr Kindratenko
Applied Parallel ProgrammingECE408AB56563LAB0 -    
Applied Parallel ProgrammingECE408AL56561LEC40930 - 1050 T R  1002 Electrical & Computer Eng Bldg Volodymyr Kindratenko
Applied Parallel ProgrammingECE408CSP77405PKG4 -    Volodymyr Kindratenko
Applied Parallel ProgrammingECE408CSP77405PKG40930 - 1050 T R    Volodymyr Kindratenko
Applied Parallel ProgrammingECE408OLB68233OLB0 -    Volodymyr Kindratenko
Applied Parallel ProgrammingECE408OLC68234OLC40930 - 1050 T R    Volodymyr Kindratenko

Official Description

Parallel programming with emphasis on developing applications for processors with many computation cores. Computational thinking, forms of parallelism, programming models, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, and application case studies. Course Information: Same as CS 483 and CSE 408. 4 undergraduate hours. 4 graduate hours. Prerequisite: ECE 220.

Subject Area

  • Computer Engineering

Course Director

Detailed Description and Outline

Parallel programming with emphasis on developing applications for processors with many computation cores. Computational thinking, forms of parallelism, programming model features, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, hardware fatures and limitations, and application case studies. Same as CS 483.

Computer Usage

Extensive usage for all programming assignments and final project

Reports

A final project report is required

Lab Projects

Lab 0 - installation and test of programming environment; Lab 1 - Parallel Vector Addition; Lab 2 - Parallel Matrix Multiplication; Lab 3 - Tiled Parallel Matrix Multiplication; Lab 4 - Parallel Reduction; Lab 5 - Parallel Scan; Lab 6 - Tiled Parallel Convolution; Lab 7 - Sparse Matrix-Vector Multiplication; Final Project that involves Project Proposal, Project Workshop, Project Presentation, and Project Report

Lab Equipment

Linux based cluster system

Lab Software

C Programming Language and CUDA Software Development Kit, WebGPU for labs, RAI for final project

Topical Prerequisites

C programming, Basic data structures, Introduction to computer organization

Texts

D. Kirk and W. Hwu, Programming Massively Parallel Processors, Morgan Kaufmann, 3rd Edition.

Required, Elective, or Selected Elective

Elective

Course Goals

The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors requires in-depth knowledge about parallel programming principles, as well as parallelism models, communication models, hardware organizations, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future design for these processors.

Instructional Objectives

A. After the seven machine problems (after approximately 20 seventy-five minute lectures) the student should be able to:

1. Analyze and implement common parallel algorithm patterns in a parallel programming model such as CUDA. (1, 2)

2. Design experiments to analyze the performance bottlenecks in their parallel code. (6)

3. Apply common parallel techniques to improve performance given hardware constraints. (1, 2, 6)

4. Learn about the features of a parallel debugger and use them to identify and repair code defects. (6, 7)

5. Learn about the features of a parallel profiler and use them to identify performance bottlenecks in their code. (6, 7)

B. By examination 2 (after approximately 29 seventy-five minute lectures) the student should be able to:

6. Understand and apply common parallel algorithm patterns. (1, 7)

7. Understand the major types of hardware limitations that limit parallel program performance. (1, 6, 7)

8. Understand and apply common parallel programming interface features. (1, 6, 7)

9. Review a parallel code segment and identify its behavior and potential problems. (b, e)

C. By the end of the final project (with proposal, workshop discussions, presentation, and report) the student should be able to:

10. Identify and solve a computational problem with parallel algorithm design and program. (1, 2, 6, 7)

11. Learn the necessary domain knowledge in order to solve the identified problem (7)

12. Work with domain experts and teammates from different disciplines to maximize the effective of solutions (3, 5)

13. Properly divide up the responsibilities among teammates and support each other towards success (3, 4, 5)

14. Identify design space and explore optimization opportunities for the solutions. (1, 2, 6, 7)

15. Motivate the problem and approach in a presentation. (3)

16. Properly explain the solutions experimented and justify the final decision and outcome. (1, 2, 3, 4, 6)

17. Identify limitations of the solutions and future directions (1, 2, 4, 6, 7)

Last updated

5/26/2019by Wen-mei Hwu