VOOZH about

URL: https://developer.nvidia.com/blog/nvidia-cupynumeric-25-03-now-fully-open-source-with-pip-and-hdf5-support/

⇱ NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support | NVIDIA Technical Blog


Related Resources

NVIDIA cuPyNumeric 25.03 Now Fully Open Source with PIP and HDF5 Support

👁 Image

NVIDIA cuPyNumeric is a library that aims to provide a distributed and accelerated drop-in replacement for NumPy built on top of the Legate framework. It brings zero-code-change scaling to multi-GPU and multinode (MGMN) accelerated computing. 

cuPyNumeric 25.03 is a milestone update that introduces powerful new capabilities and enhanced accessibility for users and developers alike, as detailed in this post.

Full stack now open source

With cuPyNumeric 25.03, NVIDIA open-sourced the Legate framework and runtime layer that powers cuPyNumeric, under the Apache 2 license. Now, the entire stack of cuPyNumeric is available under the Apache 2 license. This move aligns with NVIDIA’s commitment to transparency, reproducibility, and collaboration. Contributors can now explore, audit, contribute and extend any component of the system without barriers.

PIP install support

cuPyNumeric has supported installation through conda from the start. Now users can also install it through pip with the following simple command:

pip install nvidia-cupynumeric

This simplifies setup significantly, making it easy to integrate cuPyNumeric into your workflows, virtual environments, and CI pipelines. All major dependencies except MPI are bundled or easily resolvable through PyPI.

Together with OpenMPI and UCX, the cuPyNumeric package on PyPI is multinode and multirank capable. It enables developers to use cuPyNumeric not only in a single node with multiple GPUs but also in multi-GPU multinode clusters.

Example installation

An example of installing and running cuPyNumeric on SLURM Clusters using the PyPI wheel packages is outlined in the following sections.

Step 1: Environment setup

After logging into the cluster, load essential environment modules including CUDA and MPI. These are dependent packages needed for executing cuPyNumeric on a multinode or multirank environment. If they are not available on your cluster, install them manually or contact your system administrator to request installation.

module purge # clear existing modules
module load cuda # CUDA toolkit
module load openmpi # Open MPI

Next, create and activate a virtual environment (recommended). This is unnecessary if you want to install the packages into your current Python environment.

python -m venv legate
source legate/bin/activate

Step 2: Package installation

Install cuPyNumeric and Legate using pip:

pip install legate nvidia-cupynumeric

Step 3: Run applications

Allocate interactive compute nodes using srun:

srun -p partition-name \ # Request a partition
 -N 2 \ # 2 compute nodes
 --gres=gpu:8 \ # 8 GPU per node
 --time=00:30:00 \ # 30-minute time limit
 --pty bash # Start interactive shell

Then run a cuPyNumeric program:

legate --gpus 8 \ # GPUs per process
 --ranks-per-node 1 \ # Processes per node
 --nodes 2 \ # Total nodes (matches -N)
 --launcher mpirun \ # launch with MPI
 ./prog.py

Running with SLURM batch job submission is also supported:

#!/bin/bash
#SBATCH --job-name=cupynumeric
#SBATCH --nodes=2
#SBATCH --gres=gpu:8
#SBATCH --time=00:30:00

module load cuda openmpi
source legate/bin/activate

legate --gpus 8 \
 --ranks-per-node 1 \
 --nodes ${SLURM_NNODES} \
 --launcher mpirun \
 ./prog.py

For more information, refer to the cuPyNumeric 25.03 installation guide

Native HDF5 IO support

cuPyNumeric 25.03 provides native support for HDF5 over GPU Direct Storage, enabling efficient handling of large datasets and seamless interoperability with scientific computing environments. With HDF5, you can now consume and persist complex data structures to disk in a compact, portable, and performant format with great performance.

from legate.core.io.hdf5 import from_file
import cupynumeric as np

x = from_file("data.h5", dataset_name="x")
y = from_file("data.h5", dataset_name="y")
xx = np.asarray(x)
yy = np.asarray(y)
a = 8675.309

yy[:] = a * xx + yy

This feature is especially beneficial for high-performance computing and data-intensive applications where IO efficiency is critical.

Get started

NVIDIA cuPyNumeric 25.03 strengthens the cuPyNumeric foundation for both research and production environments. To learn about more new features and capabilities in the 25.03 release, see the release notes. The team is grateful for the growing community and welcomes feedback, contributions, and ideas for future releases. Join the conversation by submitting issues directly to the nv-legate/cupynumeric GitHub repo.

About the Authors

About Bo Dong
Bo Dong is a principal technical product manager on the CUDA team. He is responsible for NVIDIA’s go-to-market strategies for distributed computing, including Legate and other products and technologies.
About Wei Wu
Wei Wu is a senior software engineer at NVIDIA working on the Realm distributed runtime system. Prior to joining NVIDIA, he was a research scientist at the programming model team of the Los Alamos National Laboratory. His research and work interests lie in high performance computing, especially in programming models and runtime systems for large scale heterogeneous systems. He received his PhD from the University of Tennessee at Knoxville.
About Jonathan Bentz
Jonathan Bentz leads the CUDA technical marketing engineering team at NVIDIA, where his team focuses on creating and delivering engaging content and connecting with CUDA developers. Jonathan holds a PhD in Chemistry and a master’s degree in Computer Science from Iowa State University.

Comments