VOOZH about

URL: https://thenewstack.io/developers-can-now-uber-gpus-with-nvidias-lepton-platform/

⇱ Developers Can Now ‘Uber’ GPUs With NVIDIA’s Lepton Platform - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2025-05-20 11:30:51
Developers Can Now ‘Uber’ GPUs With NVIDIA’s Lepton Platform
AI / AI Operations / Hardware

Developers Can Now ‘Uber’ GPUs With NVIDIA’s Lepton Platform

NVIDIA's new one-stop shop, called DGX Cloud Lepton, will allow developers to rent NVIDIA GPUs scattered worldwide across host providers.
May 20th, 2025 11:30am by Agam Shah
👁 Featued image for: Developers Can Now ‘Uber’ GPUs With NVIDIA’s Lepton Platform

Do you desperately need an NVIDIA GPU? Don’t stress, you’ll soon be able to order one online.

NVIDIA has announced a one-stop shop called DGX Cloud Lepton, which will allow developers to rent NVIDIA GPUs scattered worldwide across host providers.

Developers can log into Lepton, specify their AI task, and rent a relevant GPU. Using the Uber analogy, GPUs will come to developers who don’t have time to go and find one.

“Similar to modern ride-sharing apps that connect riders to drivers, DGX Cloud Lepton provides a modern marketplace that connects developers to GPU compute and not just locally,” said Alexis Black Bjorlin, vice president of NVIDIA’s DGX Cloud, in a press briefing.

GPU availability isn’t a grave problem, but searching for the right service across dozens of cloud services or independent facilities can be a headache.

A host of NVIDIA GPUs will be available to order — including Blackwell, its latest GPU. Older GPUs, including Hopper (on which ChatGPT runs) and possibly Ampere, will also be available.

The GPUs will depend on the job. If users need large-scale training for models, it may be Blackwell. If users need inferencing, a Hopper or Ampere may be assigned. If users need low latency, then a GPU in a nearby region will be assigned.

The Problem: More GPU Suppliers

NVIDIA GPUs were in short supply up until last year. Microsoft, OpenAI, Amazon and Meta gobbled up most of the supplies. Even Apple was unable to acquire parts, which put them years behind in AI, according to a Bloomberg story this week.

Cloud providers initially charged a premium for access to NVIDIA GPUs. These were only available to large customers.

But as supply normalized, a bunch of independent GPU providers popped up to provide AI computing at cheaper prices. Most of these had previously mined cryptocurrency, but pivoted their hardware to AI as that market took off. Companies like Crusoe Energy, which sold its bitcoin mining business in March, acquired new GPUs and started renting them out as AI infrastructure. Another example is Voltage Park, which was founded by a crypto executive but now offers one-time H100 GPUs at $1.99 per hour.

NVIDIA’s Black Bjorlin acknowledged GPU capacity had grown, adding “finding and efficiently utilizing AI infrastructure in the right regions at high performance can be complex.”

NVIDIA’s Uber-Like Idea

Delivery of GPUs from Lepton will be from lesser known GPU providers — including CoreWeave, Crusoe, Lambda, Foxconn, and others. It’s not yet clear if users will be able to select the provider.

Lepton does not include the top-4 cloud providers, but users will be able “to bring their compute and the platforms that they currently use, so it’s certainly an option on this platform,” Bjorlin said.

Reading between the lines, NVIDIA may be creating its own cloud service and will compete with the likes of Amazon, Google and Microsoft, which also rent out GPUs.

Bjorlin also indicated that Lepton will be hybrid cloud-friendly, meaning that developers may be able to connect their data or AI workloads to rental GPUs at smaller providers.

Cheaper Alternatives

Pricing for Lepton wasn’t available, but the service requires some NVIDIA software and tooling, which isn’t cheap.

To avoid the NVIDIA premium, it may be cheaper to rent GPUs directly from smaller service providers that are already a part of Lepton.

A fully loaded NVIDIA H200 GPU from Crusoe Energy, for a single instance over a six-month term, will set users back about $120,000. A slower A100 will be about half the price.

The price of the H200 from CoreWeave is about $50 per hour.

Lepton’s strength is in its software backend, which takes care of the NVIDIA software and hardware development stack. Developers can take their programs directly to GPUs and cut off the development and microservices complexity in the middle.

Alternatively, it may be better to get hardware that runs locally — since the processing requirements for AI are also shrinking.

NVIDIA will soon start shipping DGX Spark — which delivers one petaflop of performance and can run inference on desktops. The hardware was described as “your own AI cloud sitting right next to you and it’s always on, always waiting for you,” by NVIDIA CEO Jensen Huang during a keynote.

How Lepton Will Work

Developers first need to sign up as an NVIDIA developer and create a cloud server. After that, developers can apply for access to the service.

My application for the service is currently under consideration. NVIDIA seems choosy in selecting users, limiting approvals to users who can properly exploit its GPUs and with money to spend. My applications for beta AI services have never been approved — nevertheless, developers should give it a shot.

If you get access, you can adopt LLMs or AI applications from NVIDIA’s Build website and deploy them to Lepton. It is possible to be granular and fine-tune the task for deployment to computing nodes.

Users can select the type of job and the GPU, establish the container with the development environment, and dispatch the AI job to GPUs. A training job will go to high-end GPUs, while inference will be sent to lower-end GPUs.

Developers can configure the GPU and manage multiple machines within the Lepton interface. That includes selecting a container image, selecting the instance type (CPU or GPU), and establishing the variables. NVIDIA provides templates for MPI and Torchrun to get started.

Users can also run Jupyter notebooks within containers. The pod can be accessed via SSH, browser, or through other tools. Lepton organizes and manages the nodes depending on the job and type of GPU.

Developers can bring their own hardware to integrate into Lepton for more self-managed nodes. But the hefty list of requirements includes 640GB of storage per GPU (recommended: a 20TB NVMe SSD), high-end x86 server CPUs, Ubuntu 22.04 LTS, and CUDA 12.4.1 or beyond.

TRENDING STORIES
Agam Shah has covered enterprise IT for more than a decade. Outside of machine learning, hardware and chips, he's also interested in martial arts and Russia.
Read more from Agam Shah
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.