VOOZH about

URL: https://thenewstack.io/nvidia-wants-more-programming-languages-to-support-cuda/

⇱ NVIDIA Wants More Programming Languages to Support CUDA - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-28 10:30:33
NVIDIA Wants More Programming Languages to Support CUDA
Hardware / Python / Software Development

NVIDIA Wants More Programming Languages to Support CUDA

The CUDA parallel computing platform can be programmed with C++, Fortran and Python, but the company is looking for others to run its GPUs.
Mar 28th, 2024 10:30am by Agam Shah
👁 Featued image for: NVIDIA Wants More Programming Languages to Support CUDA
Feature image by Rosy / Bad Homburg / Germany from Pixabay

NVIDIA is looking to expand support for more programming languages as it tries to woo more developers to write applications for its GPUs.

The company’s CUDA programming framework currently supports languages that include C++, Fortran and Python. But new programming languages are evolving, and the company is keen on opening up access to its GPUs for developers using those languages, said Jeff Larkin, HPC architect at NVIDIA, during a technical session at the company’s GPU Technology Conference earlier this month.

Larkin didn’t provide specifics on what programming languages it was looking at.

“My team is definitely monitoring those and trying to look for opportunities to engage on those. But [C++, Fortran and Python] are the ones that are specifically supported within our products today. There are some technologies I’m aware of, and I can’t mention here, that will further enable more languages as well,” Larkin said.

Larkin gave some examples of how some programming languages already exploit its GPUs, named dropping Judia and Rust.

Why Switch to GPUs? 

👁 Image

NVIDIA CEO and co-founder Jensen Huang speaking at GTC 2024.

The early programming models revolved around CPUs. The x86 architecture was the big kahuna, while GPUs were relegated to gaming and graphics.

Fast forward, and AI has become a reality with GPUs. NVIDIA argues that CPUs are not efficient at handling AI transactions, while GPUs — which consume more power — will provide more cost savings.

“Typically, although a GPU does use more power, it uses that more productively, and that’s where you’ll begin to see the savings,” Larkin said. “You’ll operate more quickly, more power efficiently.”

NVIDIA is tightly coupling its own ARM-based CPU, called Grace Hopper, along with GPUs. But developers need CUDA to take full advantage of the GPUs.

 How CUDA Works

At the heart of NVIDIA’s GPUs are Tensor Cores, which is the hot technology driving most of the AI computing today. The Tensor Cores are capable of low-precision math and matrix multiplication for AI computing.

The matrix style of computing is built on the GEMM algorithm, which takes advantage of the Tensor Cores and is central to NVIDIA’s AI computing model. The GEMM algorithm works with libraries in CUDA for programmers to interact with the GPU cores.

The libraries include:

  • cuBLAS: This is NVIDIA’s preferred library that provides direct access to the Tensor Cores and delivers the maximum performance. “That is your basic workhorse that has existed since the very beginning of CUDA. It is linear algebra APIs,” said Stephen Jones, CUDA architect, during a presentation at GTC. cuBLAS provides the easiest way to harness the performance of GPUs. It automates the configuration of Tensor Cores, and developers don’t have to turn knobs — cuBLAS just works out of the box.
  • CUTLASS: The lower-level CUTLASS library provides C++ and Python interfaces for coders to work with the GPU’s Tensor Cores. Developers to control the use of Tensor Cores, which means more work for developers. CUTLASS differs from cuBLAS, which automates that process. NVIDIA is building more tools for Python developers to access CUTLASS, which is a recent development and a work in progress. “You can use the PyTorch extension, and so you can emit PyTorch code from CUTLASS, and you can automatically bring CUTLASS extension Tensor Core custom kernels in Python into PyTorch,” Jones said.
  • cuBLASLt: This library sits somewhere between the cuBLAS and CUTLASS libraries, and provides varying levels of control to Tensor Cores. “CUTLASS actually calls that one in the middle, cuBLASLt, which you can also access yourself. It’s a public library. It gives these advanced APIs where you can really control a lot more aspects of what the Tensor Cores are doing,” Jones said. The cuBLASLt has advanced APIs for GEMM libraries, which opens the door for mixed-precision computing, which involves mixed and low-precision computing.
  • cuBLASDx: This can perform select linear algebra functions from cuBLAS on the device side, which improves performance and throughput. “The idea is to get your cuBLAS cores, activate it in your kernel just with a single GEMM core like you would do with cuBLAS from a CPU,” Jones said.

Python Is a Priority

NVIDIA is looking to expand access to its SDKs and frameworks to Python, which provides accessibility to more developers. That will, in turn, bring more developers to its GPUs.

“Looking at the Python stack, you have to invest everywhere, all the way across it,” Jones said.

NVIDIA wants to make Python “a complete Nvidia experience, and make the Python developer and the whole CUDA ecosystem available and accessible to the Python programmer,” Jones said.

The goal is to make more SDKs, frameworks, and domain-specific languages at the top of the stack available to more developers. At the same time, making the lower layers — accelerated libraries, system libraries and utilities, and device kernels — invisible to users. It’s still a work in progress, Jones said.

NVIDIA has worked on integrating its libraries and tools with popular Python frameworks such as PyTorch.

“JIT compilation is incredibly important in Python because Python is a very runtime-interpreted language, and you’re constantly generating data dynamically. A compiler in the loop is completely normal. In fact, the Python interpreter basically is one of those,” Jones said.

Program Well, Reap Rewards

Programming — and doing it correctly — is important to making AI more power-efficient.

Companies are measuring cost-per-transaction and trying to bring it down. AI has a crypto problem — it takes a lot of energy to run — and the cost of inference came under the microscope at GTC.

Jones argued that GPUs are more efficient in the final equation: they can deliver more FLOPS — Floating point operations per second — when factoring in rack space, time, and power consumption.

“Nobody cares how many servers you’re buying, nobody cares how many data centers you’re renting, you are renting power per month, because power is the metric that really matters for computing,” Jones said.

NVIDIA introduced new data types — FP4 and FP6 — which are lower-precision but can squeeze out more performance per watt.

The company introduced a new GPU codenamed Blackwell at GTC. A new server called DGX-B200 has eight Blackwell chips and consumes about 1,000 watts. It succeeds the H100 GPU, which is the GPU powering AI computing efforts at Microsoft, Meta, Tesla and other companies.

Compared to the DGX-H100, the DGX-B200 system power consumption is similar, but performance improves by a factor of two to three times, said Charlie Boyle, vice president and general manager of DGX systems at NVIDIA, in an interview.

No Updates to CUDA

NVIDIA’s hardware and software model is much like Apple’s: the hardware and software go hand-in-hand. The software is designed for the hardware, and vice versa.

NVIDIA is trying to lock developers into CUDA, which is a proprietary development model. To be sure, NVIDIA GPUs support other programming models such as OpenAI’s Triton, and open-source development models.

The company’s goal is to integrate the hardware and software into a so-called “AI factory,” where the input is raw data, and the output is the result. The hardware and software remain invisible to customers.

Usually, NVIDIA releases a new version of CUDA with a new GPU. However, Jones provided no significant updates to CUDA during the GTC session. NVIDIA released CUDA version 12.4 recently and may share more details later this month as the release of its Blackwell GPU draws closer.

TRENDING STORIES
Agam Shah has covered enterprise IT for more than a decade. Outside of machine learning, hardware and chips, he's also interested in martial arts and Russia.
Read more from Agam Shah
SHARE THIS STORY
TRENDING STORIES
Microsoft is a sponsor of The New Stack.
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.