👁 Birla Institute of Technology & Science, Pilani

Multicore and GPGPU Programming

Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

👁 Birla Institute of Technology & Science, Pilani

Multicore and GPGPU Programming

👁 Kunal Kishore Korgaonkar

👁 Prof. Gargi Prabhu

Instructors: Kunal Kishore Korgaonkar

Included with

•

Learn more

Ask Coursera

12 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

8 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

12 modules

Gain insight into a topic and learn the fundamentals.

Intermediate level

Recommended experience

8 weeks to complete

at 10 hours a week

Flexible schedule

Learn at your own pace

What you'll learn

Understand the fundamentals of multi-threaded programming and its applications in multicore systems.
Develop shared memory programs in OpenMP and distributed programming using MPI.
Gain a foundational understanding of GPGPU architecture and the CUDA programming model.

Skills you'll gain

Tools you'll learn

C (Programming Language)

Details to know

👁 Image

Shareable certificate

Add to your LinkedIn profile

Assessments

124 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

👁 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 12 modules in this course

The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.

The course delves into designing shared memory data structures and introduces advanced synchronisation concepts, including lazy synchronisation, crucial for scalable and efficient concurrent applications. Additionally, students will explore the architecture and programming model of General-Purpose Graphics Processing Units (GPGPUs) and learn CUDA programming to leverage GPU parallelism for compute-intensive tasks. By the end of the course, students will be adept in optimising multi-threaded and many-core applications, balancing workload across CPUs and GPUs to achieve high throughput and efficient resource utilisation. This course is essential for those aiming to develop expertise in high-performance computing and parallel programming for modern multi-core and GPU-based systems.

In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.

What's included

4 videos1 reading1 discussion prompt

4 videos•Total 51 minutes

Course Introductory Video•2 minutes
Meet Your Instructor - Dr. Gargi Prabhu •1 minute
Meet Your Instructor - Dr. Kunal Korgaonkar•1 minute
Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]•47 minutes

1 reading•Total 10 minutes

Course Overview•10 minutes

1 discussion prompt•Total 10 minutes

Meet Your Peers•10 minutes

In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.

What's included

12 videos2 readings12 assignments1 discussion prompt

12 videos•Total 73 minutes

Need for Ever-Increasing Performance•8 minutes
Parallel Systems and Parallel Programs•8 minutes
Concurrent, Parallel, Distributed Systems•5 minutes
Types of Parallelism: Data, Task and Pipeline Parallelism•8 minutes
Speedup and Efficiency•5 minutes
Amdahl’s Law •5 minutes
Gustafson’s Law •5 minutes
Scalability in Parallel Systems•5 minutes
Cost of Parallelisation•7 minutes
Sources of Overhead in Parallel Programs •5 minutes
Timing Parallel Programs: Methods and Best Practices•7 minutes
GPU Performance•5 minutes

2 readings•Total 120 minutes

Recommended Reading: Fundamentals of Parallel Computing•60 minutes
Recommended Reading: Introduction to Performance Metrics in Parallel Computing•60 minutes

12 assignments•Total 36 minutes

Need for Ever-Increasing Performance•3 minutes
Parallel Systems and Parallel Programs•3 minutes
Concurrent, Parallel, Distributed Systems•3 minutes
Types of Parallelism: Data, Task and Pipeline Parallelism•3 minutes
Speedup and Efficiency•3 minutes
Amdahl’s Law •3 minutes
Gustafson’s Law •3 minutes
Scalability in MIMD Systems•3 minutes
Cost of Parallelisation•3 minutes
Sources of Overhead in Parallel Programs•3 minutes
Taking Timings of Parallel Programs•3 minutes
GPU Performance•3 minutes

1 discussion prompt•Total 30 minutes

Why Parallelism? Revisiting the Roots of Multicore Programming•30 minutes

This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.

What's included

15 videos2 readings15 assignments1 discussion prompt

15 videos•Total 160 minutes

The Von Neumann Architecture•7 minutes
Processes, Multitasking, and Threads•5 minutes
The Basics of Caching•7 minutes
Virtual Memory•7 minutes
Instruction-Level Parallelism•9 minutes
Hardware Multithreading•6 minutes
Classifications of Parallel Computers•6 minutes
SIMD and MIMD Systems•7 minutes
Interconnection Networks: Shared Memory Systems•6 minutes
Interconnection Networks: Distributed Memory Systems•8 minutes
Cache Coherence•8 minutes
Shared-Memory vs. Distributed-Memory•4 minutes
Parallel Software: Coordinating Process and Threads•11 minutes
Distributed Memory Software•7 minutes
Recording of Multicore and GPGPU Programming: Week 2 - Live Session on 25-05-30 18:35:08 [02:05]•62 minutes

2 readings•Total 100 minutes

Recommended Reading: Architecture Background•40 minutes
Recommended Reading: Parallel Hardware and Software•60 minutes

15 assignments•Total 114 minutes

Graded Quiz - Modules 1 and 2 •60 minutes
The Von Neumann Architecture•3 minutes
Processes, Multitasking, and Threads•3 minutes
The Basics of Caching•3 minutes
Virtual Memory•3 minutes
Instruction-Level Parallelism•3 minutes
Hardware Multithreading•3 minutes
Classifications of Parallel Computer•3 minutes
SIMD and MIMD Systems•3 minutes
Interconnection Networks: Shared Memory Systems•3 minutes
Interconnection Networks: Distributed Memory Systems•6 minutes
Cache Coherence•3 minutes
Shared-Memory vs. Distributed-Memory•3 minutes
Parallel Software: Coordinating Process and Threads•12 minutes
Distributed Memory Software•3 minutes

1 discussion prompt•Total 30 minutes

From Von Neumann to Multicore: Evolving Architectures and Memory Realities•30 minutes

This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.

What's included

15 videos2 readings14 assignments1 discussion prompt

15 videos•Total 127 minutes

GPUs and GPGPU•5 minutes
GPU Architecture•5 minutes
Heterogeneous Computing•4 minutes
Paradigm of Heterogeneous Computing•5 minutes
Introduction to CUDA•5 minutes
Structure of a CUDA Program•8 minutes
Threads, Blocks, and Grid•9 minutes
Managing Memory•7 minutes
Writing and Verifying Your Kernel•6 minutes
Compiling and Running CUDA Program•4 minutes
Nvidia Compute Capabilities and Device Architecture•6 minutes
Timing Your Kernel•7 minutes
Organising Parallel Threads•5 minutes
Managing Devices•4 minutes
Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]•45 minutes

2 readings•Total 75 minutes

Recommended Reading: GPGPU Architecture and CUDA•15 minutes
Recommended Reading: Programming Model Overview•60 minutes

14 assignments•Total 48 minutes

GPUs and GPGPU•6 minutes
GPU Architecture•3 minutes
Heterogeneous Computing•3 minutes
Paradigm of Heterogeneous Computing•3 minutes
Introduction to CUDA•3 minutes
Structure of a CUDA Program•3 minutes
Threads, Blocks, and Grid•6 minutes
Managing Memory•3 minutes
Writing and Verifying Your Kernel•3 minutes
Compiling and Running CUDA Program•3 minutes
Nvidia Compute Capabilities and Device Architecture•3 minutes
Timing Your Kernel•3 minutes
Organising Parallel Threads•3 minutes
Managing Devices•3 minutes

1 discussion prompt•Total 30 minutes

Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism•30 minutes

This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.

What's included

15 videos2 readings15 assignments1 discussion prompt

15 videos•Total 135 minutes

Introduction to CUDA Execution Model•7 minutes
Warps and Thread Blocks•4 minutes
Warp Divergence•9 minutes
Resource Partitioning•6 minutes
Latency Hiding•10 minutes
Occupancy•5 minutes
Synchronization•4 minutes
Scalability•5 minutes
Exposing Parallelism•10 minutes
Checking Active Warps with Nvprof•6 minutes
Checking Memory Operations with Nvprof•7 minutes
Avoiding Branch Divergence•3 minutes
The Parallel Reduction Problem and Thread Divergence•7 minutes
Improving Divergence in Parallel Reduction•6 minutes
Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]•45 minutes

2 readings•Total 120 minutes

Recommended Reading: Structure of a CUDA Program•60 minutes
Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence•60 minutes

15 assignments•Total 105 minutes

Graded Quiz - Modules 3 and 4 •60 minutes
Introduction to CUDA Execution Model•3 minutes
Warps and Thread Blocks •3 minutes
Warp Divergence•3 minutes
Resource Partitioning•6 minutes
Latency Hiding•3 minutes
Occupancy•3 minutes
Synchronization•3 minutes
Scalability•3 minutes
Exposing Parallelism•3 minutes
Checking Active Warps with Nvprof•3 minutes
Checking Memory Operations with Nvprof•3 minutes
Avoiding Branch Divergence•3 minutes
The Parallel Reduction Problem and Thread Divergence•3 minutes
Improving Divergence in Parallel Reduction•3 minutes

1 discussion prompt•Total 30 minutes

Under the Hood: Warps, Divergence, and CUDA Execution Dynamics•30 minutes

The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.

What's included

14 videos2 readings14 assignments1 discussion prompt1 ungraded lab

14 videos•Total 126 minutes

Introduction to CUDA Memory Model•8 minutes
Memory Allocation and Deallocation•6 minutes
Zero Copy Memory•4 minutes
Unified Virtual Addressing and Unified Memory •3 minutes
Aligned and Coalesced Access•6 minutes
CUDA Shared Memory•6 minutes
Shared Memory Banks and Access Mode •7 minutes
Configuring the Amount of Shared Memory•5 minutes
Synchronisation•9 minutes
CUDA Streams•7 minutes
Stream Scheduling and Priorities•6 minutes
CUDA Events•6 minutes
Concurrent Kernel Execution•6 minutes
Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]•48 minutes

2 readings•Total 120 minutes

Recommended Reading: CUDA Memory Model•60 minutes
Recommended Reading: Streams and Concurrency•60 minutes

14 assignments•Total 342 minutes

SGA-1: CUDA Programming and Performance Optimisation•300 minutes
Introduction to CUDA Memory Model•3 minutes
Memory Allocation and Deallocation•3 minutes
Zero Copy Memory•3 minutes
Unified Virtual Addressing and Unified Memory •3 minutes
Aligned and Coalesced Access•3 minutes
CUDA Shared Memory•6 minutes
Shared Memory Banks and Access Mode •3 minutes
Configuring the Amount of Shared Memory•3 minutes
Synchronisation•3 minutes
CUDA Streams•3 minutes
Stream Scheduling and Priorities•3 minutes
CUDA Events•3 minutes
Concurrent Kernel Execution•3 minutes

1 discussion prompt•Total 30 minutes

Smart Memory and Seamless Concurrency: CUDA Memory and Streams•30 minutes

1 ungraded lab•Total 60 minutes

Hands on lab: Parallel Matrix Addition Using CUDA•60 minutes

This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.

What's included

10 videos11 readings10 assignments1 discussion prompt

10 videos•Total 116 minutes

Processes, Threads and Pthreads•4 minutes
Hello World!!•9 minutes
Matrix-Vector Multiplication•13 minutes
Critical Sections•5 minutes
Busy Waiting•6 minutes
Mutexes•5 minutes
Semaphores•7 minutes
Barriers and Condition Variables•13 minutes
Caches, Cache-Coherence and False Sharing•9 minutes
Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]•44 minutes

11 readings•Total 295 minutes

Recommended Reading: Processes, Threads and Pthreads•10 minutes
Recommended Reading: Hello World!!•60 minutes
Recommended Reading: Matrix-Vector Multiplication•15 minutes
Recommended Reading: Critical Sections•30 minutes
Recommended Reading: Busy Waiting•20 minutes
Recommended Reading: Mutexes•15 minutes
Recommended Reading: Semaphores•30 minutes
Recommended Reading: Barriers and Condition Variables•30 minutes
Recommended Reading: Read-Write Locks•60 minutes
Recommended Reading: Caches, Cache-Coherence and False Sharing•15 minutes
Lab Instruction Document•10 minutes

10 assignments•Total 135 minutes

Graded Quiz - Modules 5 and 6 •60 minutes
Processes, Threads and Pthreads•9 minutes
Hello World!!•9 minutes
Matrix-Vector Multiplication•9 minutes
Critical Sections•9 minutes
Busy Waiting•9 minutes
Mutexes•9 minutes
Semaphores•6 minutes
Barriers and Condition Variables•6 minutes
Caches, Cache-Coherence and False Sharing•9 minutes

1 discussion prompt•Total 10 minutes

Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads•10 minutes

This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.

What's included

7 videos9 readings7 assignments1 discussion prompt

7 videos•Total 70 minutes

Introduction to MPI•4 minutes
MPI Setup and Communicator Functions•6 minutes
SPMD and Communication•10 minutes
Potential Pitfalls•4 minutes
Simple Serial Sorting Algorithm•20 minutes
Parallel Odd-Even Transposition Sort•19 minutes
Safety in MPI Programs•7 minutes

9 readings•Total 125 minutes

Recommended Reading: Introduction to MPI•15 minutes
Recommended Reading: MPI Setup and Communicator Functions•15 minutes
Recommended Reading: SPMD and Communication•15 minutes
Recommended Reading: Potential Pitfalls•15 minutes
Recommended Reading: Simple Serial Sorting Algorithm•15 minutes
Recommended Reading: Parallel Odd-Even Transposition Sort•15 minutes
Recommended Reading: Safety in MPI Programs •15 minutes
Lab: Practice Code•10 minutes
Lab: Practice Solution•10 minutes

7 assignments•Total 63 minutes

Introduction to MPI•9 minutes
MPI Setup and Communicator Functions•9 minutes
SPMD and Communication•9 minutes
Potential Pitfalls•9 minutes
Simple Serial Sorting Algorithm•9 minutes
Parallel Odd-Even Transposition Sort•9 minutes
Safety in MPI Programs•9 minutes

1 discussion prompt•Total 30 minutes

MPI in Action: Understanding Setup, Communication, and Parallel Sorting•30 minutes

This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.

What's included

12 videos12 readings13 assignments1 discussion prompt

12 videos•Total 94 minutes

Introduction to OpenMP•5 minutes
Programming in OpenMP•10 minutes
Trapezoidal Rule•10 minutes
Scope of Variables•4 minutes
Reduction Clause•7 minutes
Parallel-For Directive and Caveats in Them•8 minutes
Sorting Functions•20 minutes
Scheduling•6 minutes
Producers and Consumers•6 minutes
Termination, Startup and Atomic Directive•7 minutes
Critical Sections and Locks•6 minutes
Tasking•5 minutes

12 readings•Total 152 minutes

Recommended Reading: Introduction to OpenMP•15 minutes
Recommended Reading: Programming in OpenMP•15 minutes
Recommended Reading: Trapezoidal Rule•15 minutes
Recommended Reading: Scope of Variables•15 minutes
Recommended Reading: Reduction Clause•15 minutes
Recommended Reading: Parallel-For Directive and Caveats in Them•15 minutes
Recommended Reading: Sorting Functions•15 minutes
Recommended Reading: Scheduling •15 minutes
Recommended Reading: Producers and Consumers•15 minutes
Recommended Reading: Termination, Startup and Atomic Directive•1 minute
Recommended Reading: Critical Sections and Locks•1 minute
Recommended Reading: Tasking•15 minutes

13 assignments•Total 168 minutes

Graded Quiz - Modules 7 and 8•60 minutes
Introduction to OpenMP•9 minutes
Programming in OpenMP•9 minutes
Trapezoidal Rule•9 minutes
Scope of Variables•9 minutes
Reduction Clause•9 minutes
Parallel-For Directive and Caveats in Them•9 minutes
Sorting Functions•9 minutes
Scheduling•9 minutes
Producers and Consumers•9 minutes
Termination, Startup and Atomic Directive•9 minutes
Critical Sections and Locks•9 minutes
Tasking•9 minutes

1 discussion prompt•Total 30 minutes

Mastering OpenMP: From Parallel Patterns to Synchronisation•30 minutes

This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.

What's included

13 videos13 readings13 assignments1 discussion prompt

13 videos•Total 107 minutes

Introduction to N-body Problem•8 minutes
Serial Solutions to the N-body Problem•16 minutes
Parallelising Strategy•13 minutes
Parallelising Basic Solver Using OpenMP•9 minutes
Parallelising Reduced Solver Using OpenMP •11 minutes
Evaluating OpenMP Performance•5 minutes
Parallelising Basic Solver Using Pthreads •4 minutes
Parallelising Basic Solver Using MPI •9 minutes
Parallelising Reduced Solver Using MPI•9 minutes
Evaluating MPI Performance•6 minutes
Parallelising Basic Solver Using CUDA•7 minutes
Evaluating CUDA Solver and Improving Performance•4 minutes
Using Shared Memory for Solvers•7 minutes

13 readings•Total 195 minutes

Recommended Reading: Introduction to N-body Problem•15 minutes
Recommended Reading: Serial Solutions to the N-body Problem•15 minutes
Recommended Reading: Parallelising Strategy•15 minutes
Recommended Reading: Parallelising Basic Solver Using OpenMP•15 minutes
Recommended Reading: Parallelising Reduced Solver Using OpenMP•15 minutes
Recommended Reading: Evaluating OpenMP performance•15 minutes
Recommended Reading: Parallelising Basic Solver Using Pthreads•15 minutes
Recommended Reading: Parallelising Basic Solver Using MPI•15 minutes
Recommended Reading: Parallelising Reduced Solver Using MPI•15 minutes
Recommended Reading: Evaluating MPI Performance•15 minutes
Recommended Reading: Parallelising Basic Solver Using CUDA•15 minutes
Recommended Reading: Evaluating CUDA Solver and Improving Performance•15 minutes
Recommended Reading: Using Shared Memory for Solvers•15 minutes

13 assignments•Total 138 minutes

Introduction to N-body Problem•9 minutes
Serial Solutions to the N-body Problem•9 minutes
Parallelising Strategy•9 minutes
Parallelising Basic Solver Using OpenMP•9 minutes
Parallelising Reduced Solver Using OpenMP•9 minutes
Evaluating OpenMP Performance•9 minutes
Parallelising Basic Solver Using Pthreads•9 minutes
Parallelising Basic Solver Using MPI•30 minutes
Parallelising Reduced Solver Using MPI•9 minutes
Evaluating MPI Performance•9 minutes
Parallelising Basic Solver Using CUDA•9 minutes
Evaluating CUDA Solver and Improving Performance•9 minutes
Using Shared Memory for Solvers•9 minutes

1 discussion prompt•Total 30 minutes

The N-Body Solver: Exploring Parallelism Across Models•30 minutes

This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.

What's included

8 videos9 readings10 assignments1 discussion prompt

8 videos•Total 61 minutes

Sample Sort and Bucket Sort•10 minutes
Map•17 minutes
Implementing Sample Sort Using OpenMP: First Implementation•5 minutes
Implementing Sample Sort Using OpenMP: Second Implementation•7 minutes
Implementing Sample Sort Using Pthreads •4 minutes
Implementing Sample Sort Using MPI•6 minutes
Implementing Sample Sort Using MPI: Example•5 minutes
Implementing Sample Sort Using CUDA •7 minutes

9 readings•Total 115 minutes

Recommended Reading: Sample Sort and Bucket Sort•15 minutes
Recommended Reading: Map•10 minutes
Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation•15 minutes
Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation•15 minutes
Recommended Reading: Implementing Sample Sort Using Pthreads•10 minutes
Recommended Reading: Implementing Sample Sort Using MPI•15 minutes
Recommended Reading: Implementing Sample Sort Using MPI: Example•15 minutes
Recommended Reading: Implementing Sample Sort Using CUDA•10 minutes
Recommended Reading: Which API?•10 minutes

10 assignments•Total 432 minutes

Graded Quiz - Modules 9 and 10•60 minutes
SGA-2: Odd-Even Transposition Sort Parallelisation •300 minutes
Sample Sort and Bucket Sort•9 minutes
Map (Quiz)•9 minutes
Implementing Sample Sort Using OpenMP: First Implementation•9 minutes
Implementing Sample Sort Using OpenMP: Second Implementation•9 minutes
Implementing Sample Sort Using Pthreads•9 minutes
Implementing Sample Sort Using MPI•9 minutes
Implementing Sample Sort Using MPI: Example•9 minutes
Implementing Sample Sort Using CUDA•9 minutes

1 discussion prompt•Total 30 minutes

Parallel Sample Sort Across Platforms•30 minutes

Final Comprehensive Examination

What's included

1 assignment

1 assignment•Total 30 minutes

Final Comprehensive Examination •30 minutes

Instructors

👁 Kunal Kishore Korgaonkar

Kunal Kishore Korgaonkar

Birla Institute of Technology & Science, Pilani

2 Courses•1,943 learners

👁 Prof. Gargi Prabhu

Prof. Gargi Prabhu

Birla Institute of Technology & Science, Pilani

1 Course•61 learners

Offered by

👁 Image

Birla Institute of Technology & Science, Pilani

Explore more from Algorithms

👁 Image
Status: Preview
B
Birla Institute of Technology & Science, Pilani
Multicore and GPGPU Programming
Course
👁 Image
P
Packt
GPU Programming with C++ and CUDA
Course
👁 Image
Status: Free Trial
J
Johns Hopkins University
Introduction to Concurrent Programming with GPUs
Course
👁 Image
Status: Free Trial
J
Johns Hopkins University
Introduction to Parallel Programming with CUDA
Course

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

👁 Image

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

👁 Image

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

URL: https://www.coursera.org/learn/multicore-and-gpgpu-programming