VOOZH about

URL: https://www.coursera.org/learn/multicore-and-gpgpu-programming

⇱ Multicore and GPGPU Programming | Coursera


Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

8 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

8 weeks to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Understand the fundamentals of multi-threaded programming and its applications in multicore systems.

  • Develop shared memory programs in OpenMP and distributed programming using MPI.

  • Gain a foundational understanding of GPGPU architecture and the CUDA programming model.

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

124 assignments

Taught in English

There are 12 modules in this course

The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.

The course delves into designing shared memory data structures and introduces advanced synchronisation concepts, including lazy synchronisation, crucial for scalable and efficient concurrent applications. Additionally, students will explore the architecture and programming model of General-Purpose Graphics Processing Units (GPGPUs) and learn CUDA programming to leverage GPU parallelism for compute-intensive tasks. By the end of the course, students will be adept in optimising multi-threaded and many-core applications, balancing workload across CPUs and GPUs to achieve high throughput and efficient resource utilisation. This course is essential for those aiming to develop expertise in high-performance computing and parallel programming for modern multi-core and GPU-based systems.

In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.

What's included

4 videos1 reading1 discussion prompt

4 videosTotal 51 minutes
  • Course Introductory Video2 minutes
  • Meet Your Instructor - Dr. Gargi Prabhu 1 minute
  • Meet Your Instructor - Dr. Kunal Korgaonkar1 minute
  • Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]47 minutes
1 readingTotal 10 minutes
  • Course Overview10 minutes
1 discussion promptTotal 10 minutes
  • Meet Your Peers10 minutes

In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.

What's included

12 videos2 readings12 assignments1 discussion prompt

12 videosTotal 73 minutes
  • Need for Ever-Increasing Performance8 minutes
  • Parallel Systems and Parallel Programs8 minutes
  • Concurrent, Parallel, Distributed Systems5 minutes
  • Types of Parallelism: Data, Task and Pipeline Parallelism8 minutes
  • Speedup and Efficiency5 minutes
  • Amdahl’s Law 5 minutes
  • Gustafson’s Law 5 minutes
  • Scalability in Parallel Systems5 minutes
  • Cost of Parallelisation7 minutes
  • Sources of Overhead in Parallel Programs 5 minutes
  • Timing Parallel Programs: Methods and Best Practices7 minutes
  • GPU Performance5 minutes
2 readingsTotal 120 minutes
  • Recommended Reading: Fundamentals of Parallel Computing60 minutes
  • Recommended Reading: Introduction to Performance Metrics in Parallel Computing60 minutes
12 assignmentsTotal 36 minutes
  • Need for Ever-Increasing Performance3 minutes
  • Parallel Systems and Parallel Programs3 minutes
  • Concurrent, Parallel, Distributed Systems3 minutes
  • Types of Parallelism: Data, Task and Pipeline Parallelism3 minutes
  • Speedup and Efficiency3 minutes
  • Amdahl’s Law 3 minutes
  • Gustafson’s Law 3 minutes
  • Scalability in MIMD Systems3 minutes
  • Cost of Parallelisation3 minutes
  • Sources of Overhead in Parallel Programs3 minutes
  • Taking Timings of Parallel Programs3 minutes
  • GPU Performance3 minutes
1 discussion promptTotal 30 minutes
  • Why Parallelism? Revisiting the Roots of Multicore Programming30 minutes

This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.

What's included

15 videos2 readings15 assignments1 discussion prompt

15 videosTotal 160 minutes
  • The Von Neumann Architecture7 minutes
  • Processes, Multitasking, and Threads5 minutes
  • The Basics of Caching7 minutes
  • Virtual Memory7 minutes
  • Instruction-Level Parallelism9 minutes
  • Hardware Multithreading6 minutes
  • Classifications of Parallel Computers6 minutes
  • SIMD and MIMD Systems7 minutes
  • Interconnection Networks: Shared Memory Systems6 minutes
  • Interconnection Networks: Distributed Memory Systems8 minutes
  • Cache Coherence8 minutes
  • Shared-Memory vs. Distributed-Memory4 minutes
  • Parallel Software: Coordinating Process and Threads11 minutes
  • Distributed Memory Software7 minutes
  • Recording of Multicore and GPGPU Programming: Week 2 - Live Session on 25-05-30 18:35:08 [02:05]62 minutes
2 readingsTotal 100 minutes
  • Recommended Reading: Architecture Background40 minutes
  • Recommended Reading: Parallel Hardware and Software60 minutes
15 assignmentsTotal 114 minutes
  • Graded Quiz - Modules 1 and 2 60 minutes
  • The Von Neumann Architecture3 minutes
  • Processes, Multitasking, and Threads3 minutes
  • The Basics of Caching3 minutes
  • Virtual Memory3 minutes
  • Instruction-Level Parallelism3 minutes
  • Hardware Multithreading3 minutes
  • Classifications of Parallel Computer3 minutes
  • SIMD and MIMD Systems3 minutes
  • Interconnection Networks: Shared Memory Systems3 minutes
  • Interconnection Networks: Distributed Memory Systems6 minutes
  • Cache Coherence3 minutes
  • Shared-Memory vs. Distributed-Memory3 minutes
  • Parallel Software: Coordinating Process and Threads12 minutes
  • Distributed Memory Software3 minutes
1 discussion promptTotal 30 minutes
  • From Von Neumann to Multicore: Evolving Architectures and Memory Realities30 minutes

This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.

What's included

15 videos2 readings14 assignments1 discussion prompt

15 videosTotal 127 minutes
  • GPUs and GPGPU5 minutes
  • GPU Architecture5 minutes
  • Heterogeneous Computing4 minutes
  • Paradigm of Heterogeneous Computing5 minutes
  • Introduction to CUDA5 minutes
  • Structure of a CUDA Program8 minutes
  • Threads, Blocks, and Grid9 minutes
  • Managing Memory7 minutes
  • Writing and Verifying Your Kernel6 minutes
  • Compiling and Running CUDA Program4 minutes
  • Nvidia Compute Capabilities and Device Architecture6 minutes
  • Timing Your Kernel7 minutes
  • Organising Parallel Threads5 minutes
  • Managing Devices4 minutes
  • Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]45 minutes
2 readingsTotal 75 minutes
  • Recommended Reading: GPGPU Architecture and CUDA15 minutes
  • Recommended Reading: Programming Model Overview60 minutes
14 assignmentsTotal 48 minutes
  • GPUs and GPGPU6 minutes
  • GPU Architecture3 minutes
  • Heterogeneous Computing3 minutes
  • Paradigm of Heterogeneous Computing3 minutes
  • Introduction to CUDA3 minutes
  • Structure of a CUDA Program3 minutes
  • Threads, Blocks, and Grid6 minutes
  • Managing Memory3 minutes
  • Writing and Verifying Your Kernel3 minutes
  • Compiling and Running CUDA Program3 minutes
  • Nvidia Compute Capabilities and Device Architecture3 minutes
  • Timing Your Kernel3 minutes
  • Organising Parallel Threads3 minutes
  • Managing Devices3 minutes
1 discussion promptTotal 30 minutes
  • Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism30 minutes

This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.

What's included

15 videos2 readings15 assignments1 discussion prompt

15 videosTotal 135 minutes
  • Introduction to CUDA Execution Model7 minutes
  • Warps and Thread Blocks4 minutes
  • Warp Divergence9 minutes
  • Resource Partitioning6 minutes
  • Latency Hiding10 minutes
  • Occupancy5 minutes
  • Synchronization4 minutes
  • Scalability5 minutes
  • Exposing Parallelism10 minutes
  • Checking Active Warps with Nvprof6 minutes
  • Checking Memory Operations with Nvprof7 minutes
  • Avoiding Branch Divergence3 minutes
  • The Parallel Reduction Problem and Thread Divergence7 minutes
  • Improving Divergence in Parallel Reduction6 minutes
  • Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]45 minutes
2 readingsTotal 120 minutes
  • Recommended Reading: Structure of a CUDA Program60 minutes
  • Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence60 minutes
15 assignmentsTotal 105 minutes
  • Graded Quiz - Modules 3 and 4 60 minutes
  • Introduction to CUDA Execution Model3 minutes
  • Warps and Thread Blocks 3 minutes
  • Warp Divergence3 minutes
  • Resource Partitioning6 minutes
  • Latency Hiding3 minutes
  • Occupancy3 minutes
  • Synchronization3 minutes
  • Scalability3 minutes
  • Exposing Parallelism3 minutes
  • Checking Active Warps with Nvprof3 minutes
  • Checking Memory Operations with Nvprof3 minutes
  • Avoiding Branch Divergence3 minutes
  • The Parallel Reduction Problem and Thread Divergence3 minutes
  • Improving Divergence in Parallel Reduction3 minutes
1 discussion promptTotal 30 minutes
  • Under the Hood: Warps, Divergence, and CUDA Execution Dynamics30 minutes

The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.

What's included

14 videos2 readings14 assignments1 discussion prompt1 ungraded lab

14 videosTotal 126 minutes
  • Introduction to CUDA Memory Model8 minutes
  • Memory Allocation and Deallocation6 minutes
  • Zero Copy Memory4 minutes
  • Unified Virtual Addressing and Unified Memory 3 minutes
  • Aligned and Coalesced Access6 minutes
  • CUDA Shared Memory6 minutes
  • Shared Memory Banks and Access Mode 7 minutes
  • Configuring the Amount of Shared Memory5 minutes
  • Synchronisation9 minutes
  • CUDA Streams7 minutes
  • Stream Scheduling and Priorities6 minutes
  • CUDA Events6 minutes
  • Concurrent Kernel Execution6 minutes
  • Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]48 minutes
2 readingsTotal 120 minutes
  • Recommended Reading: CUDA Memory Model60 minutes
  • Recommended Reading: Streams and Concurrency60 minutes
14 assignmentsTotal 342 minutes
  • SGA-1: CUDA Programming and Performance Optimisation300 minutes
  • Introduction to CUDA Memory Model3 minutes
  • Memory Allocation and Deallocation3 minutes
  • Zero Copy Memory3 minutes
  • Unified Virtual Addressing and Unified Memory 3 minutes
  • Aligned and Coalesced Access3 minutes
  • CUDA Shared Memory6 minutes
  • Shared Memory Banks and Access Mode 3 minutes
  • Configuring the Amount of Shared Memory3 minutes
  • Synchronisation3 minutes
  • CUDA Streams3 minutes
  • Stream Scheduling and Priorities3 minutes
  • CUDA Events3 minutes
  • Concurrent Kernel Execution3 minutes
1 discussion promptTotal 30 minutes
  • Smart Memory and Seamless Concurrency: CUDA Memory and Streams30 minutes
1 ungraded labTotal 60 minutes
  • Hands on lab: Parallel Matrix Addition Using CUDA60 minutes

This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.

What's included

10 videos11 readings10 assignments1 discussion prompt

10 videosTotal 116 minutes
  • Processes, Threads and Pthreads4 minutes
  • Hello World!!9 minutes
  • Matrix-Vector Multiplication13 minutes
  • Critical Sections5 minutes
  • Busy Waiting6 minutes
  • Mutexes5 minutes
  • Semaphores7 minutes
  • Barriers and Condition Variables13 minutes
  • Caches, Cache-Coherence and False Sharing9 minutes
  • Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]44 minutes
11 readingsTotal 295 minutes
  • Recommended Reading: Processes, Threads and Pthreads10 minutes
  • Recommended Reading: Hello World!!60 minutes
  • Recommended Reading: Matrix-Vector Multiplication15 minutes
  • Recommended Reading: Critical Sections30 minutes
  • Recommended Reading: Busy Waiting20 minutes
  • Recommended Reading: Mutexes15 minutes
  • Recommended Reading: Semaphores30 minutes
  • Recommended Reading: Barriers and Condition Variables30 minutes
  • Recommended Reading: Read-Write Locks60 minutes
  • Recommended Reading: Caches, Cache-Coherence and False Sharing15 minutes
  • Lab Instruction Document10 minutes
10 assignmentsTotal 135 minutes
  • Graded Quiz - Modules 5 and 6 60 minutes
  • Processes, Threads and Pthreads9 minutes
  • Hello World!!9 minutes
  • Matrix-Vector Multiplication9 minutes
  • Critical Sections9 minutes
  • Busy Waiting9 minutes
  • Mutexes9 minutes
  • Semaphores6 minutes
  • Barriers and Condition Variables6 minutes
  • Caches, Cache-Coherence and False Sharing9 minutes
1 discussion promptTotal 10 minutes
  • Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads10 minutes

This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.

What's included

7 videos9 readings7 assignments1 discussion prompt

7 videosTotal 70 minutes
  • Introduction to MPI4 minutes
  • MPI Setup and Communicator Functions6 minutes
  • SPMD and Communication10 minutes
  • Potential Pitfalls4 minutes
  • Simple Serial Sorting Algorithm20 minutes
  • Parallel Odd-Even Transposition Sort19 minutes
  • Safety in MPI Programs7 minutes
9 readingsTotal 125 minutes
  • Recommended Reading: Introduction to MPI15 minutes
  • Recommended Reading: MPI Setup and Communicator Functions15 minutes
  • Recommended Reading: SPMD and Communication15 minutes
  • Recommended Reading: Potential Pitfalls15 minutes
  • Recommended Reading: Simple Serial Sorting Algorithm15 minutes
  • Recommended Reading: Parallel Odd-Even Transposition Sort15 minutes
  • Recommended Reading: Safety in MPI Programs 15 minutes
  • Lab: Practice Code10 minutes
  • Lab: Practice Solution10 minutes
7 assignmentsTotal 63 minutes
  • Introduction to MPI9 minutes
  • MPI Setup and Communicator Functions9 minutes
  • SPMD and Communication9 minutes
  • Potential Pitfalls9 minutes
  • Simple Serial Sorting Algorithm9 minutes
  • Parallel Odd-Even Transposition Sort9 minutes
  • Safety in MPI Programs9 minutes
1 discussion promptTotal 30 minutes
  • MPI in Action: Understanding Setup, Communication, and Parallel Sorting30 minutes

This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.

What's included

12 videos12 readings13 assignments1 discussion prompt

12 videosTotal 94 minutes
  • Introduction to OpenMP5 minutes
  • Programming in OpenMP10 minutes
  • Trapezoidal Rule10 minutes
  • Scope of Variables4 minutes
  • Reduction Clause7 minutes
  • Parallel-For Directive and Caveats in Them8 minutes
  • Sorting Functions20 minutes
  • Scheduling6 minutes
  • Producers and Consumers6 minutes
  • Termination, Startup and Atomic Directive7 minutes
  • Critical Sections and Locks6 minutes
  • Tasking5 minutes
12 readingsTotal 152 minutes
  • Recommended Reading: Introduction to OpenMP15 minutes
  • Recommended Reading: Programming in OpenMP15 minutes
  • Recommended Reading: Trapezoidal Rule15 minutes
  • Recommended Reading: Scope of Variables15 minutes
  • Recommended Reading: Reduction Clause15 minutes
  • Recommended Reading: Parallel-For Directive and Caveats in Them15 minutes
  • Recommended Reading: Sorting Functions15 minutes
  • Recommended Reading: Scheduling 15 minutes
  • Recommended Reading: Producers and Consumers15 minutes
  • Recommended Reading: Termination, Startup and Atomic Directive1 minute
  • Recommended Reading: Critical Sections and Locks1 minute
  • Recommended Reading: Tasking15 minutes
13 assignmentsTotal 168 minutes
  • Graded Quiz - Modules 7 and 860 minutes
  • Introduction to OpenMP9 minutes
  • Programming in OpenMP9 minutes
  • Trapezoidal Rule9 minutes
  • Scope of Variables9 minutes
  • Reduction Clause9 minutes
  • Parallel-For Directive and Caveats in Them9 minutes
  • Sorting Functions9 minutes
  • Scheduling9 minutes
  • Producers and Consumers9 minutes
  • Termination, Startup and Atomic Directive9 minutes
  • Critical Sections and Locks9 minutes
  • Tasking9 minutes
1 discussion promptTotal 30 minutes
  • Mastering OpenMP: From Parallel Patterns to Synchronisation30 minutes

This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.

What's included

13 videos13 readings13 assignments1 discussion prompt

13 videosTotal 107 minutes
  • Introduction to N-body Problem8 minutes
  • Serial Solutions to the N-body Problem16 minutes
  • Parallelising Strategy13 minutes
  • Parallelising Basic Solver Using OpenMP9 minutes
  • Parallelising Reduced Solver Using OpenMP 11 minutes
  • Evaluating OpenMP Performance5 minutes
  • Parallelising Basic Solver Using Pthreads 4 minutes
  • Parallelising Basic Solver Using MPI 9 minutes
  • Parallelising Reduced Solver Using MPI9 minutes
  • Evaluating MPI Performance6 minutes
  • Parallelising Basic Solver Using CUDA7 minutes
  • Evaluating CUDA Solver and Improving Performance4 minutes
  • Using Shared Memory for Solvers7 minutes
13 readingsTotal 195 minutes
  • Recommended Reading: Introduction to N-body Problem15 minutes
  • Recommended Reading: Serial Solutions to the N-body Problem15 minutes
  • Recommended Reading: Parallelising Strategy15 minutes
  • Recommended Reading: Parallelising Basic Solver Using OpenMP15 minutes
  • Recommended Reading: Parallelising Reduced Solver Using OpenMP15 minutes
  • Recommended Reading: Evaluating OpenMP performance15 minutes
  • Recommended Reading: Parallelising Basic Solver Using Pthreads15 minutes
  • Recommended Reading: Parallelising Basic Solver Using MPI15 minutes
  • Recommended Reading: Parallelising Reduced Solver Using MPI15 minutes
  • Recommended Reading: Evaluating MPI Performance15 minutes
  • Recommended Reading: Parallelising Basic Solver Using CUDA15 minutes
  • Recommended Reading: Evaluating CUDA Solver and Improving Performance15 minutes
  • Recommended Reading: Using Shared Memory for Solvers15 minutes
13 assignmentsTotal 138 minutes
  • Introduction to N-body Problem9 minutes
  • Serial Solutions to the N-body Problem9 minutes
  • Parallelising Strategy9 minutes
  • Parallelising Basic Solver Using OpenMP9 minutes
  • Parallelising Reduced Solver Using OpenMP9 minutes
  • Evaluating OpenMP Performance9 minutes
  • Parallelising Basic Solver Using Pthreads9 minutes
  • Parallelising Basic Solver Using MPI30 minutes
  • Parallelising Reduced Solver Using MPI9 minutes
  • Evaluating MPI Performance9 minutes
  • Parallelising Basic Solver Using CUDA9 minutes
  • Evaluating CUDA Solver and Improving Performance9 minutes
  • Using Shared Memory for Solvers9 minutes
1 discussion promptTotal 30 minutes
  • The N-Body Solver: Exploring Parallelism Across Models30 minutes

This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.

What's included

8 videos9 readings10 assignments1 discussion prompt

8 videosTotal 61 minutes
  • Sample Sort and Bucket Sort10 minutes
  • Map17 minutes
  • Implementing Sample Sort Using OpenMP: First Implementation5 minutes
  • Implementing Sample Sort Using OpenMP: Second Implementation7 minutes
  • Implementing Sample Sort Using Pthreads 4 minutes
  • Implementing Sample Sort Using MPI6 minutes
  • Implementing Sample Sort Using MPI: Example5 minutes
  • Implementing Sample Sort Using CUDA 7 minutes
9 readingsTotal 115 minutes
  • Recommended Reading: Sample Sort and Bucket Sort15 minutes
  • Recommended Reading: Map10 minutes
  • Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation15 minutes
  • Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation15 minutes
  • Recommended Reading: Implementing Sample Sort Using Pthreads10 minutes
  • Recommended Reading: Implementing Sample Sort Using MPI15 minutes
  • Recommended Reading: Implementing Sample Sort Using MPI: Example15 minutes
  • Recommended Reading: Implementing Sample Sort Using CUDA10 minutes
  • Recommended Reading: Which API?10 minutes
10 assignmentsTotal 432 minutes
  • Graded Quiz - Modules 9 and 1060 minutes
  • SGA-2: Odd-Even Transposition Sort Parallelisation 300 minutes
  • Sample Sort and Bucket Sort9 minutes
  • Map (Quiz)9 minutes
  • Implementing Sample Sort Using OpenMP: First Implementation9 minutes
  • Implementing Sample Sort Using OpenMP: Second Implementation9 minutes
  • Implementing Sample Sort Using Pthreads9 minutes
  • Implementing Sample Sort Using MPI9 minutes
  • Implementing Sample Sort Using MPI: Example9 minutes
  • Implementing Sample Sort Using CUDA9 minutes
1 discussion promptTotal 30 minutes
  • Parallel Sample Sort Across Platforms30 minutes

Final Comprehensive Examination

What's included

1 assignment

1 assignmentTotal 30 minutes
  • Final Comprehensive Examination 30 minutes

Instructors

Birla Institute of Technology & Science, Pilani
2 Courses1,943 learners
Birla Institute of Technology & Science, Pilani
1 Course61 learners

Explore more from Algorithms

Why people choose Coursera for their career

👁 Image

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
👁 Image

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
👁 Image

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
👁 Image

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions

To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

Financial aid available,