Lecture 1 - Introduction to Parallel Programming
Lecture 2 - Parallel Architectures and Programming Models
Lecture 3 - Pipelining
Lecture 4 - Superpipelining and VLIW
Lecture 5 - Memory Latency
Lecture 6 - Cache and Temporal Locality
Lecture 7 - Cache, Memory bandwidth and Spatial Locality
Lecture 8 - Intuition for Shared and Distributed Memory architectures
Lecture 9 - Shared and Distributed Memory architectures
Lecture 10 - Interconnection networks in Distributed Memory architectures
Lecture 11 - OpenMP: A parallel Hello World Program
Lecture 12 - Program with Single thread
Lecture 13 - Program Memory with Multiple threads and Multi-tasking
Lecture 14 - Context Switching
Lecture 15 - OpenMP: Basic thread functions
Lecture 16 - OpenMP: About OpenMP
Lecture 17 - Shared Memory Consistency Models and the Sequential Consistency Model
Lecture 18 - Race Conditions
Lecture 19 - OpenMP: Scoping variables and some race conditions
Lecture 20 - OpenMP: thread private variables and more constructs
Lecture 21 - Computing sum: first attempt at parallelization
Lecture 22 - Manual distribution of work and critical sections
Lecture 23 - Distributing for loops and reduction
Lecture 24 - Vector-Vector operations (Dot product)
Lecture 25 - Matrix-Vector operations (Matrix-Vector Multiply)
Lecture 26 - Matrix-Matrix operations (Matrix-Matrix Multiply)
Lecture 27 - Introduction to tasks
Lecture 28 - Task queues and task execution
Lecture 29 - Accessing variables in tasks
Lecture 30 - Completion of tasks and scoping variables in tasks
Lecture 31 - Recursive task spawning and pitfalls
Lecture 32 - Understanding LU Factorization
Lecture 33 - Parallel LU Factorization
Lecture 34 - Locks
Lecture 35 - Advanced Task handling
Lecture 36 - Matrix Multiplication using tasks
Lecture 37 - The OpenMP Shared Memory Consistency Model
Lecture 38 - Applications finite element method
Lecture 39 - Applications deep learning
Lecture 40 - Introduction to MPI and basic calls
Lecture 41 - MPI calls to send and receive data
Lecture 42 - MPI calls for broadcasting data
Lecture 43 - MPI non blocking calls
Lecture 44 - Application distributed histogram updation
Lecture 45 - MPI collectives and MPI broadcast
Lecture 46 - MPI gathering and scattering collectives
Lecture 47 - MPI reduction and alltoall collectives
Lecture 48 - Discussion on MPI collectives design
Lecture 49 - Characteriziation of interconnects
Lecture 50 - Linear arrays 2D mesh and torus
Lecture 51 - d dimensional torus
Lecture 52 - Hypercube
Lecture 53 - Trees and cliques
Lecture 54 - Hockney model
Lecture 55 - Broadcast and Reduce with recursive doubling
Lecture 56 - Scatter and Gather with recursive doubling
Lecture 57 - Reduce scatter and All gather with recursive doubling
Lecture 58 - Discussion of message sizes in analysis
Lecture 59 - Revisiting Reduce scatter on 2D mesh
Lecture 60 - Reduce scatter and Allreduce on the Hypercube
Lecture 61 - Alltoall on the Hypercube
Lecture 62 - Lower bounds
Lecture 63 - Pipeline based algorithm for Allreduce
Lecture 64 - An improved algorithm for Alltoall on the Hypercube using E-cube routing
Lecture 65 - Pipeline based algorithm for Broadcast
Lecture 66 - Introduction to parallel graph algorithms
Lecture 67 - Breadth First Search BFS using matrix algebra
Lecture 68 - BFS Shared memory parallelization using OpenMP
Lecture 69 - Distributed memory settings and data distribution
Lecture 70 - Distributed BFS algorithm
Lecture 71 - Performance considerations
Lecture 72 - Prims Algorithm
Lecture 73 - OpenMP based shared memory parallelization for MST
Lecture 74 - MPI based distributed memory parallelization for MST
Lecture 75 - Sequential Algorithm Adaptation from Prims
Lecture 76 - Parallelization Strategy for Prims algorithm
Lecture 77 - Dry run with the parallel strategy
Lecture 78 - Johnsons algorithm with 1D data distribution
Lecture 79 - Speedup analysis on a grid graph
Lecture 80 - Floyds algorithm for all pair shortest paths
Lecture 81 - Floyds algorithm with 2D data distribution
Lecture 82 - Adaptation to transitive closures
Lecture 83 - Parallelization strategy for connected components
Lecture 84 - Analysis for parallel connected components