Building Lifelike Digital Avatars with NVIDIA ACE Microservices
👁 Still image from Kairos demo, of an NPC at a bar.AI-Generated Summary
- NVIDIA's Avatar Cloud Engine (ACE) is revolutionizing the gaming industry by enabling the creation of digital avatars with lifelike personalities and interactions.
- The ACE models, including NVIDIA Riva Automatic Speech Recognition (Riva ASR), NVIDIA Riva Text-to-Speech (Riva TTS), NVIDIA Audio2Face (A2F), and NVIDIA NeMo Large Language Model (NeMo LLM), work together to generate dynamic character responses to gamer input.
- The integration of ACE microservices, such as A2F and Riva ASR, with platforms like Convai is enabling game developers to create non-playable characters (NPCs) with advanced features like spatial awareness, actions, and NPC-to-NPC interaction.
AI-generated content may summarize information incompletely. Verify important information. Learn more
Generative AI technologies are revolutionizing how games are produced and played. Game developers are exploring how these technologies can accelerate their content pipelines and provide new gameplay experiences previously thought impossible. One area of focus, digital avatars, will have a transformative impact on how gamers will interact with non-playable characters (NPCs).
Historically, NPCs have predetermined responses and facial animations, where players can only communicate within a limited set of options. These player interactions tend to be transactional, short-lived, and oftentimes skipped.
However, with NVIDIA Avatar Cloud Engine (ACE), middleware, tool, and game developers can take four state-of-the-art AI models and implement them into an end-to-end digital avatar solution. The ACE models use a flexible combination of local and cloud resources that transform gamer input into a dynamic character response. The models include the following:
- NVIDIA Riva Automatic Speech Recognition (Riva ASR) for transcribing human speech.
- NVIDIA Riva Text-to-Speech (Riva TTS) to generate audible speech.
- NVIDIA Audio2Face (A2F) to generate facial expressions and lip movements.
- NVIDIA NeMo Large Language Model (NeMo LLM) to understand player text and transcribed voice and generate a response.
NVIDIA announced that the A2F and Riva ASR microservices are now available for middleware, tool, and game developers looking to enhance game studio NPCs.
Explore microservices through NVIDIA AI Foundry
If you have an NVIDIA AI Enterprise license, you can access the microservices now and then deploy them on DGX Cloud or any CSP or private cloud.
The A2F microservice now has emotional support and quality improvements including lip sync. NVIDIA Riva ASR supports more languages than ever—Italian, EU Spanish, German, and Mandarin—with the overall accuracy being much improved.
Later this month, you can go to NVIDIA AI Foundation Models to explore, experience, and evaluate these available AI models directly from a browser or through API endpoints running a fully accelerated stack. You can deploy anywhere with NVIDIA AI Enterprise.
Kairos demo evolves with new technologies from Convai
In collaboration with Convai, NVIDIA showed the latest version of the Kairos demo to showcase how next-generation AI NPCs will revolutionize gaming. Convai is an NPC developer platform that makes it easy for you to enable characters in 3D worlds to have human-like conversation, perception, and action abilities.
“Generative-AI-powered characters in virtual worlds unlock various use cases and experiences that were previously impossible. Convai is leveraging Riva ASR and A2F to enable lifelike non-playable characters (NPC) with low latency response times and high fidelity natural animation,” said Purnendu Mukherjee, founder and CEO at Convai.
Open-ended conversations with NPCs open up a world of possibilities for interactivity in games. However, conversations should have consequences that could lead to potential actions. To carry out actions from NPCs, they must be aware of the world around them and be able to interact dynamically.
With our partner Convai and their latest releases, we take our collaborative demo to the next level, enabling these AI NPCs with the following new features:
- Spatial awareness: Enables game characters to interact and describe the world throughout conversations.
- Actions: Enables game characters to interact with items in the game world based on the conversation, for example, delivering a bottle of sake when requested.
- NPC-to-NPC Interaction: Enables game characters to have generated conversations without the player’s interaction.
Convai has integrated the new NVIDIA ACE microservices, Audio2Face and Riva ASR. Game characters now get improved lip sync, better expression, and accurate speech detection when listening to the player.
Here’s a look at the new Kairos demo.
Convai is working closely with NVIDIA to deliver the next generation of AI-powered digital characters. For more information about getting started with their platform, see Playground Walkthrough.
Top digital avatar developers embrace NVIDIA ACE
NVIDIA is working with top developers in the gaming ecosystem to create digital avatars that use ACE technologies, including Charisma.AI, Inworld, miHoYo, NetEase Games, OurPalm, Tencent, Ubisoft, and UneeQ.
“This is a milestone moment for AI in games,” said Tencent Games. “NVIDIA ACE and Tencent Games will help lay the foundation that will bring digital avatars with individual, lifelike personalities and interactions to video games.”
“For years NVIDIA has been the pied piper of gaming technologies, delivering new and innovative ways to create games. NVIDIA is making games more intelligent and playable through the adoption of gaming AI technologies, which ultimately creates a more immersive experience,” said Zhipeng Hu, senior vice president of NetEase and head of the LeiHuo business group.
Learn more about the NVIDIA Audio2Face and NVIDIA Riva ASR microservices, explore the technologies on NVIDIA AI Foundation Models, and deploy anywhere with NVIDIA AI Enterprise to begin integrating intelligent NPCs into your solutions today.
Tags
About the Authors
Seth Schneider is the senior product manager for esports and competitive gaming products like G-SYNC Esports displays and NVIDIA Reflex. As a competitive gamer, Seth strives to bring the best competitive gaming products to market. Current grind: Valorant.
Ike Nnoli is a senior product marketing manager at NVIDIA. Ike is responsible for driving the adoption of real-time ray-tracing graphics and AI software development kits across the developer network. Previously, Ike held product marketing positions at PlayStation and design engineering roles at the Boeing Company. He holds an MBA from UCLA and a bachelor's degree in mechanical engineering from Northwestern University.
Yasmina Benkhoui leads technical engagements for AI natives at NVIDIA, driving the transition of agentic AI from prototype to production using Nemotron LLM/Speech models, NemoClaw, OpenShell, Confidential Compute, and the NeMo Agent Toolkit. She also heads a strategic ISV readiness program, where she manages software optimization on the latest NVIDIA GPUs and ensures partner feedback directly informs product roadmaps and hardware performance. With a foundation in deep learning and computer vision, Yasmina bridges the gap between complex AI research and enterprise-grade deployment.
