![]() |
VOOZH | about |
In CUDA, a kernel launch is the process of starting parallel execution of a kernel function on the GPU from the Host (CPU). This is done using the Execution Configuration syntax <<< ... >>>, which specifies how many blocks and threads will execute the kernel on the Device (GPU).
KernelName<<<blocksPerGrid, threadsPerBlock>>>(arguments);
When you have N total operations to perform, grid (entire collection of threads) must be large enough to provide at least N threads. Because threads are launched in fixed-size blocks, we use ceiling division to calculate the number of blocks. Below is the formula we use:
This formula ensures that if N is not perfectly divisible by T, an extra block is automatically added to handle the "remainder" elements.
CUDA provides the dim3 type to organize threads and blocks in 2D or 3D. This is useful for tasks like image processing, matrices and volumetric data, where data exists in multiple dimensions.
Explanation:
Example: This example shows a kernel being launched with 4 blocks, where each block contains 8 threads.
Output
Launching 4 blocks with 8 threads each...
Block ID: 1, Thread ID: 0
Block ID: 1, Thread ID: 1
Block ID: 1, Thread ID: 2
Block ID: 1, Thread ID: 3
Block ID: 1, Thread ID: 4
Block ID: 1, Thread ID: 5
Block ID: 1, Thread ID: 6
Block ID: 1, Thread ID: 7
Block ID: 0, Thread ID: 0
Block ID: 0, Thread ID: 1
Block ID: 0, Thread ID: 2
Block ID: 0, Thread ID: 3
Block ID: 0, Thread ID: 4
Block ID: 0, Thread ID: 5
Block ID: 0, Thread ID: 6
Block ID: 0, Thread ID: 7
Block ID: 3, Thread ID: 0
Block ID: 3, Thread ID: 1
Block ID: 3, Thread ID: 2
Block ID: 3, Thread ID: 3
Block ID: 3, Thread ID: 4
Block ID: 3, Thread ID: 5
Block ID: 3, Thread ID: 6
Block ID: 3, Thread ID: 7
Block ID: 2, Thread ID: 0
Block ID: 2, Thread ID: 1
Block ID: 2, Thread ID: 2
Block ID: 2, Thread ID: 3
Block ID: 2, Thread ID: 4
Block ID: 2, Thread ID: 5
Block ID: 2, Thread ID: 6
Block ID: 2, Thread ID: 7
Explanation: