![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
In the current environment of AI frenzy, the implementation of complex AI workflows is becoming increasingly popular among companies that wish to enhance their offerings with AI abilities. In this article, I’ll share a behind-the-scenes look at how we implement event-driven architecture (EDA) in complex AI workflows at Gcore.
I’ll walk you through the initial challenges, the architectural decisions made and the outcomes of employing an EDA in a dynamic, real-world scenario, showing how EDA enhances system responsiveness, scalability and flexibility for managing AI-driven tasks like subtitle generation for video content.
EDA is a design pattern centered around the production, detection, consumption and reaction to events rather than static, predefined operations. An event is any significant change in a state or an update that occurs within the system. EDA allows different parts of a system to communicate and operate independently, driven by the occurrence of these events, which can be anything from a user action to a completed process.
The adoption of EDA in AI workflow management marks a significant evolution from traditional architectures, such as monolithic, service-oriented or polling-based architectures. Its principles of asynchronous communication, decoupling and dynamic scalability align perfectly with the demands of modern AI applications, with three key benefits:
These benefits, observed across different sectors, enhance the scalability and responsiveness of AI systems and also their robustness and adaptability, making EDA indispensable for managing complex, multimodel AI workflows across industries and use cases.
At Gcore, we’ve implemented EDA within Gcore Video Streaming AI features. One way we apply EDA is to generate subtitles for video using AI.
This project began with the goal of improving the efficiency, latency, scalability and reliability of subtitle generation in multiple languages from raw video content. The process involves several complex steps:
Six steps in AI subtitle generation architecture
Each of these steps requires specialized AI models or algorithms and may require data processing in real or near-real time, especially in live-streaming scenarios. The result? Serious complexity.
The complexity arises not only from the technical challenges associated with each task, but also from the need to efficiently manage the flow of data between steps, handle errors or exceptions, and scale resources dynamically based on demand.
In our pursuit of orchestrating such sophisticated and demanding AI workflows, we designed an AI system that functions with precision and agility through a well-defined EDA. The architecture of this platform, outlined in the figure below, addresses all stages of AI-driven tasks, facilitates communication between components and ensures that each task can be dynamically scaled and autonomously handled.
Workflow of Gcore Streaming Platform AI subtitle generation
Four core components underlie the Gcore streaming AI platform backend. All these components are versatile and essential to a wide range of AI applications.
At the front of the architecture lies the API service, which uses the robust Django framework. This is the primary interface for user interactions and processes incoming requests for varied services including transcription and content moderation services like nudity detection. This layer validates and parses incoming requests, triggering a cascade of subsequent tasks in the workflow, as represented on the far left of the diagram above where a user initiates a transcription request to the API service.
Diving deeper into the backend, we leverage Celery, an asynchronous task queue that acts as a robust background processing engine. Celery is tasked with managing AI processes, such as transcribing audio to text or analyzing content for nudity, and other standalone processes, such as synchronizing transcribed content into subtitles. Celery, in combination with Redis, which acts as a message broker, orchestrates these tasks and ensures that each task initiation and completion are driven by the occurrence of predefined events.
Celery’s ability to handle AI workflows is enhanced by a suite of advanced features for orchestrating complex workflows: groups, chains and chords. These tools allow for decomposing high-level, complex AI tasks into granular subtasks, handling their dependencies, and aggregating their results.
Redis plays a crucial role in our system as the broker and mediator, managing the distribution and coordination of tasks across the backend. It utilizes its fast, in-memory data structure store to handle the task queue efficiently. Within the architecture, task signatures and chains act as the mediators controlling the flow and logic of task execution. This mediation is based on event signals indicating task completion.
Redis’ ability to process these signals quickly is vital for maintaining a dynamic and responsive workflow, as shown in the diagram above: Tasks are received by the Redis broker and directed to the appropriate processing containers, and their results are collected post-inference for seamless task transitions and data integrity.
Each AI Celery worker is dedicated to a specific AI task, deploying and managing AI models such as Whisper for transcription and Pyannote for voice activity detection (VAD). These workers operate in isolated environments so that each task is processed in a controlled and secure manner, minimizing the risk of interference between tasks. This setup enhances the scalability of our system by allowing each worker to scale independently based on task demands while simultaneously ensuring high reliability and efficiency in AI model execution.
The Gcore backend I just described produces three major benefits that are particularly important for AI workflows: scaling, reliability and latency reduction.
The platform scales to handle varying demand by dynamically allocating cloud resources and leveraging GPU acceleration for intensive ML tasks. This ensures seamless scaling, avoiding the performance bottlenecks and high costs typical of traditional systems. By adapting computing power in real time, the system efficiently manages workloads during both peak and off-peak times without compromising performance.
Gcore Video Streaming AI features are all designed for high reliability with robust fault tolerance and sophisticated error handling. Strategies like data replication and automatic recovery mechanisms ensure system continuity even during failures. In video transcription, if a segment of audio is corrupted, our system can either skip or retry processing that segment, rather than wasting resources on discarding or retrying the whole audio track.
System latency for AI elements is reduced by minimizing idle times and enhancing the transition speed between tasks. We employ three key strategies:
In video transcription, rather than processing the entire video at once, we break it into segments for concurrent processing. This approach shortens transcription times and ensures resources are used efficiently, boosting overall system responsiveness.
Adopting this system revolutionized the management of complex AI workflows within the Gcore Video Streaming backend. Specifically, the EDA enabled us to reduce analysis time, parallelize AI tasks, scale AI workers independently and ensure system flexibility.
Sharing is caring: Here are three things to keep in mind when setting up your own EDA for AI workflows to get the best results right away.
We’re always looking to the future and innovating our EDA AI systems at Gcore. Two future directions look particularly promising.
Incorporating mechanisms for continuous learning and model adaptation requires periodically updating models with new data and, less obviously, dynamically adjusting workflows and processes based on real-time performance metrics and feedback loops. As AI models continue to grow in complexity and capability, developing robust systems for continuous evaluation and deployment becomes critical. This includes automated performance monitoring, version control and seamless deployment of updated models without disrupting service.
Our architecture needs to adapt to AI’s changes. While the rise of LLMs and generative AI (GenAI) might suggest that traditional AI inference workflows could become obsolete, the reality is that our proposed architecture supports critical areas of AI deployment, such as continuous model learning and evaluation. Our event-driven system’s flexibility makes it well-suited to integrate LLMs for enhanced decision-making processes and to adapt workflows in response to the capabilities of GenAI, where AI models will increasingly be replaced by a single, more powerful one.
At Gcore, we have found that adopting an EDA for workflow processing offers significant benefits for scalability, reliability and efficiency in managing complex AI systems in cloud and streaming environments. This approach addresses critical challenges, including dynamic scaling of large ML models, system robustness and latency reduction. EDA is already proving itself essential for the evolution of scalable and efficient AI systems.