VOOZH about

URL: https://thenewstack.io/how-cios-can-battle-gpu-poverty-in-the-age-of-ai/

⇱ How CIOs Can Battle GPU Poverty in the Age of AI - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-05-07 11:16:12
How CIOs Can Battle GPU Poverty in the Age of AI
sponsor-nginx,sponsored-post-contributed,
AI / DevOps / Hardware

How CIOs Can Battle GPU Poverty in the Age of AI

By adopting a model-first mentality, optimizing utilization and wielding load balancing strategically, CIOs can mitigate the shortage of chips.
May 7th, 2024 11:16am by Liam Crilly
👁 Featued image for: How CIOs Can Battle GPU Poverty in the Age of AI
AI-generated art from Pixabay.
NGINX sponsored this post.

The gold rush of the AI era is on, but for many companies, the pickaxes are on backorder. A phenomenon known as “GPU poverty” is plaguing CIOs as demand for artificial intelligence skyrockets, outpacing the ability to build the data centers and, more importantly, the chips needed to power it all.

In a nutshell, GPU poverty means that organizations that would like to use GPUs for AI computing simply cannot buy capacity on these powerful parallel processing systems that are the most efficient way to run many types of machine learning.

This scarcity has its roots in the perfect storm of perfect storms. A global chip shortage of powerful graphical processing units has led startups to raise money specifically to buy GPUs — an insane tactic when you consider that massive capital expenditures prior to revenues is exactly the problem cloud computing solves. Then there are the ever-increasing demands of AI workloads.

As more and more enterprises look to either leverage AI services from the likes of OpenAI and Google or to tap into AI models and toolchains in the cloud, they add to the pressure on GPU pricing — putting GPUs further out of reach for startups and other organizations lacking capital.

GPU poverty is rippling up and down the entire supply chain and along the whole toolbelt for AI builders. Data center construction outfits face multiyear backlogs for in-demand core components such as backup generators and electrical transformers. Even finding suitable locations with cheap real estate, cheap and abundant power and fast connectivity to the global internet has become far more daunting.

Then there’s the matter of the missing chips. Semiconductor fabrication plants are struggling to keep up and their efforts to rapidly build new fabs will only yield fruit over many years.

Meanwhile, hyperscale cloud providers and large enterprises are gobbling up the limited supply of GPU production, driving prices through the roof. For many companies, particularly those without bottomless budgets, difficulties accessing GPUs in the cloud for AI applications is becoming a significant business risk.

Smart CIOs, however, can take the edge off GPU insanity with common sense steps to reduce resource requirements to run AI in their enterprises.

Use Frugal Models and Inferencing

Just like a resourceful traveler learns to pack light, data scientists can achieve amazing results with smaller, more efficient AI models. For example, Microsoft’s Phi-2 model, which was trained on textbooks and super high-quality data, is both compact and resource-efficient, requiring far less compute to tune and inference.

Newer techniques like quantization and pruning are allowing researchers to shrink down behemoth models without sacrificing accuracy. Frameworks like TensorFlow Lite are specifically designed for deploying these leaner models on edge devices, and startups like Hugging Face are democratizing access to pre-trained, efficient models. The team responsible for the PyTorch framework is also creating new ways to train models effectively with less data and overhead.

Optimize Everything

With the stratospheric prices of GPU time, optimizing AI workloads pays off quickly and well. AI engineering and MLOps teams should aggressively and frequently profile performance to identify bottlenecks. This can mean benchmarking different configurations (batch sizes, number of GPUs) to find the most efficient setup for your specific task, because it’s not always straightforward.

Savvy teams will combine and tune data precisions (FP16, FP32, etc.) during training to reduce memory usage and run larger batch sizes. Managing memory allocation and data movements with techniques like data pre-fetching and finely timed data transfers to closely follow compute availability can help.

Finding the ideal batch size for AI jobs is crucial. A larger batch size can better utilize the GPU, but too large can lead to out-of-memory errors. Experiment to find the sweet spot. Make sure to try out GPU virtualization software if you have larger GPUs or have reserved a lot of GPU capacity. This can allow you to repurpose valuable and rare compute necessary for training models or doing larger tunings to address more run-of-the-mill model inferencing required for AI application operations.

Lastly, deploy on a foundation of containers that enables automatic scaling, if possible, to dynamically adjust the number of GPUs allocated to a workload based on real-time needs. This helps avoid overprovisioning while ensuring enough resources for peak periods.

Tune Load Balancing for AI

Properly tuned load balancing tackles the challenge of GPU poverty while ensuring AI jobs receive the resources they need without timeouts and offering enhanced security. It differs from traditional load balancing by recognizing the diverse computational requirements of AI tasks.

By profiling workloads, assessing their CPU and GPU needs, and prioritizing time-sensitive operations, AI-specific load balancers dynamically distribute work across the most suitable hardware. This approach safeguards your expensive GPUs for operations that genuinely demand their capabilities, while offloading CPU-bound work to more cost-effective resources.

Critically, AI-specific load balancing introduces a new dimension of control with token management. In AI systems where tokens play a role (language models), balancing loads isn’t just about hardware efficiency. Load balancers can monitor token usage associated with AI jobs, dynamically rerouting requests to optimize token consumption and prevent cost overruns.

Moreover, by intelligently routing jobs based on their potential security implications and token sensitivities, AI load balancers help isolate high-risk workloads, providing an additional layer of protection for your AI systems. Implementing such a load-balancing strategy necessitates careful consideration of framework integration, robust monitoring and potential cost savings with cloud-based AI load-balancing solutions.

AI-tuned load balancers might deliver more granular control — token-based rate limiting, for example, and algorithms that ship or shift jobs to LLM clusters that are the most economical in terms of token usage or costs.

The Future Is (Hopefully) Abundant

The good news is that the industry isn’t sitting idly by. Chipmakers are ramping up production, and new chip architectures specifically designed for AI are on the horizon. More AI data centers will come online. Many smart developers and engineering teams are continually improving the way AI models work and reducing the burden for training models while holding the line or even improving on performance.

However, these solutions won’t arrive overnight. In the meantime, by adopting a model-first mentality, optimizing utilization and wielding load balancing strategically, CIOs can mitigate the worst excesses of the current infrastructure bubble and avoid GPU poverty, ensuring that their organizations have enough AI for the jobs that need to be done.

NGINX, now a part of F5, is the company behind the popular open source project, NGINX. NGINX offers a suite of technologies to develop and deliver modern applications including NGINX Plus for load balancing, App Protect for security, and NGINX Ingress Controller to get control of Kubernetes.
Learn More
The latest from NGINX
TRENDING STORIES
Liam Crilly, senior director of product management at F5, wrote his first web app in 1993, and has enjoyed working with internet software ever since. Liam has led various products across F5, including NGINX open source projects.
Read more from Liam Crilly
NGINX sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: run.ai, OpenAI.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.