![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
NVIDIA is looking to expand support for more programming languages as it tries to woo more developers to write applications for its GPUs.
The company’s CUDA programming framework currently supports languages that include C++, Fortran and Python. But new programming languages are evolving, and the company is keen on opening up access to its GPUs for developers using those languages, said Jeff Larkin, HPC architect at NVIDIA, during a technical session at the company’s GPU Technology Conference earlier this month.
Larkin didn’t provide specifics on what programming languages it was looking at.
“My team is definitely monitoring those and trying to look for opportunities to engage on those. But [C++, Fortran and Python] are the ones that are specifically supported within our products today. There are some technologies I’m aware of, and I can’t mention here, that will further enable more languages as well,” Larkin said.
Larkin gave some examples of how some programming languages already exploit its GPUs, named dropping Judia and Rust.
The early programming models revolved around CPUs. The x86 architecture was the big kahuna, while GPUs were relegated to gaming and graphics.
Fast forward, and AI has become a reality with GPUs. NVIDIA argues that CPUs are not efficient at handling AI transactions, while GPUs — which consume more power — will provide more cost savings.
“Typically, although a GPU does use more power, it uses that more productively, and that’s where you’ll begin to see the savings,” Larkin said. “You’ll operate more quickly, more power efficiently.”
NVIDIA is tightly coupling its own ARM-based CPU, called Grace Hopper, along with GPUs. But developers need CUDA to take full advantage of the GPUs.
At the heart of NVIDIA’s GPUs are Tensor Cores, which is the hot technology driving most of the AI computing today. The Tensor Cores are capable of low-precision math and matrix multiplication for AI computing.
The matrix style of computing is built on the GEMM algorithm, which takes advantage of the Tensor Cores and is central to NVIDIA’s AI computing model. The GEMM algorithm works with libraries in CUDA for programmers to interact with the GPU cores.
The libraries include:
NVIDIA is looking to expand access to its SDKs and frameworks to Python, which provides accessibility to more developers. That will, in turn, bring more developers to its GPUs.
“Looking at the Python stack, you have to invest everywhere, all the way across it,” Jones said.
NVIDIA wants to make Python “a complete Nvidia experience, and make the Python developer and the whole CUDA ecosystem available and accessible to the Python programmer,” Jones said.
The goal is to make more SDKs, frameworks, and domain-specific languages at the top of the stack available to more developers. At the same time, making the lower layers — accelerated libraries, system libraries and utilities, and device kernels — invisible to users. It’s still a work in progress, Jones said.
NVIDIA has worked on integrating its libraries and tools with popular Python frameworks such as PyTorch.
“JIT compilation is incredibly important in Python because Python is a very runtime-interpreted language, and you’re constantly generating data dynamically. A compiler in the loop is completely normal. In fact, the Python interpreter basically is one of those,” Jones said.
Programming — and doing it correctly — is important to making AI more power-efficient.
Companies are measuring cost-per-transaction and trying to bring it down. AI has a crypto problem — it takes a lot of energy to run — and the cost of inference came under the microscope at GTC.
Jones argued that GPUs are more efficient in the final equation: they can deliver more FLOPS — Floating point operations per second — when factoring in rack space, time, and power consumption.
“Nobody cares how many servers you’re buying, nobody cares how many data centers you’re renting, you are renting power per month, because power is the metric that really matters for computing,” Jones said.
NVIDIA introduced new data types — FP4 and FP6 — which are lower-precision but can squeeze out more performance per watt.
The company introduced a new GPU codenamed Blackwell at GTC. A new server called DGX-B200 has eight Blackwell chips and consumes about 1,000 watts. It succeeds the H100 GPU, which is the GPU powering AI computing efforts at Microsoft, Meta, Tesla and other companies.
Compared to the DGX-H100, the DGX-B200 system power consumption is similar, but performance improves by a factor of two to three times, said Charlie Boyle, vice president and general manager of DGX systems at NVIDIA, in an interview.
NVIDIA’s hardware and software model is much like Apple’s: the hardware and software go hand-in-hand. The software is designed for the hardware, and vice versa.
NVIDIA is trying to lock developers into CUDA, which is a proprietary development model. To be sure, NVIDIA GPUs support other programming models such as OpenAI’s Triton, and open-source development models.
The company’s goal is to integrate the hardware and software into a so-called “AI factory,” where the input is raw data, and the output is the result. The hardware and software remain invisible to customers.
Usually, NVIDIA releases a new version of CUDA with a new GPU. However, Jones provided no significant updates to CUDA during the GTC session. NVIDIA released CUDA version 12.4 recently and may share more details later this month as the release of its Blackwell GPU draws closer.