ECE 408 - Applied Parallel Programming
Detailed Description and Outline
Parallel programming with emphasis on developing applications for processors with many computation cores. Computational thinking, forms of parallelism, programming model features, mapping computations to parallel hardware, efficient data structures, paradigms for efficient parallel algorithms, hardware fatures and limitations, and application case studies. Same as CS 483.
Extensive usage for all programming assignments and final project
A final project report is required
Lab 0 - installation and test of programming environment; Lab 1 - Parallel Vector Addition; Lab 2 - Parallel Matrix Multiplication; Lab 3 - Tiled Parallel Matrix Multiplication; Lab 4 - Parallel Reduction; Lab 5 - Parallel Scan; Lab 6 - Tiled Parallel Convolution; Lab 7 - Sparse Matrix-Vector Multiplication; Final Project that involves Project Proposal, Project Workshop, Project Presentation, and Project Report
Linux based cluster system
C Programming Language and CUDA Software Development Kit, WebGPU for labs, RAI for final project
C programming, Basic data structures, Introduction to computer organization
D. Kirk and W. Hwu, Programming Massively Parallel Processors, Morgan Kaufmann, 3rd Edition.
Required, Elective, or Selected Elective
The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors requires in-depth knowledge about parallel programming principles, as well as parallelism models, communication models, hardware organizations, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future design for these processors.
A. After the seven machine problems (after approximately 20 seventy-five minute lectures) the student should be able to:
2. Design experiments to analyze the performance bottlenecks in their parallel code. (6)
B. By examination 2 (after approximately 29 seventy-five minute lectures) the student should be able to:
9. Review a parallel code segment and identify its behavior and potential problems. (b, e)
C. By the end of the final project (with proposal, workshop discussions, presentation, and report) the student should be able to:
11. Learn the necessary domain knowledge in order to solve the identified problem (7)
15. Motivate the problem and approach in a presentation. (3)