VOOZH about

URL: https://thenewstack.io/machine-learning-for-real-time-data-analysis-training-models-in-production/

⇱ Machine Learning for Real-Time Data Analysis: Training Models in Production - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-10-24 06:00:11
Machine Learning for Real-Time Data Analysis: Training Models in Production
sponsor-celerdata,sponsored-topic,
AI / Data

Machine Learning for Real-Time Data Analysis: Training Models in Production

Low latency data is useful for selecting and updating model features and weights for more accurate results.
Oct 24th, 2023 6:00am by Jelani Harper
👁 Featued image for: Machine Learning for Real-Time Data Analysis: Training Models in Production

Some of the most sophisticated real-time data analytics involves training advanced machine learning models while they’re deployed in production. With this approach, the models’ weights and features are continually updated with the most recent data available.

Consequently, model outputs become more refined, precise, and accurate for highly specific segments of any particular use case.

Streaming data platforms and streaming data engines are ideal for this form of real-time data analysis, since they supply the ongoing data necessary to tailor model responses with low latency. This data informs the feature selection process that enables models to adjust to a vast array of circumstances that impact their results.

According to Gul Ege, SAS Senior director of advanced analytics, “It makes a lot of sense for the product and user data, and their features and their selections, to be updated, and the model to be updated, as they change.”

Supporting use cases span everything from computer vision monitoring to online recommendation engines for ad tech, insurance technology, e-commerce and more. With such a wide variety of applications, the capacity to simultaneously train and deploy machine learning models is becoming increasingly vital to the advancement of real-time data analysis.

CelerData helps enterprises accelerate business growth with a unified analytics platform that delivers 3X the performance of any other solution on the market while reducing operating costs by up to 80%. Powered by StarRocks, CelerData is used worldwide by leading brands including Airbnb and Lenovo.
Learn More
The latest from CelerData

Training in Production

Recommendation engines provide a good example of the utility derived from training machine learning models while they’re in production. Regardless of the particular application, this methodology is considered a progression of that in which models are trained offline, deployed online, and then compared against their offline performances to see if their scores have changed. There’s a dichotomy of the feature selection process for these applications, as illustrated by an ad tech use case in which real-time recommendations surface ads based on someone’s most recent clicks on an e-commerce site, for instance.

“You have the features of the product and features of the person, and what the recommendation system should recommend is dependent on both,” Ege specified. Although the features of the product may not be as dynamic as those of the users browsing the site, the ability to align them, in real-time, with the latest data is essential for producing timely, relevant recommendations.

“The features are the behavior of the end user, what their interactions are with the site,” Ege commented. “And, the product has features. If I’m looking for a red skirt, please don’t show me blue trousers or a purse.”

Historic Data Considerations

Despite the rapidity at which data is generated for delivering recommendations with this approach, model features are also informed by certain historic data considerations. The training period is rarely instantaneous and often a continual one in which the model tends to perform better over time. According to Ege, for many deployments in which models are trained, deployed, and updated online, “Some of them take some time to warm up. You can start with the first optimization of, let’s say, a customer making a transaction. And then, the same customer comes back again, making another one. So, the model warms up over time.”

Each of the behaviors in the respective transactions impacts what the model learns about that customer, others like him or her, or however else organizations have segmented the data for the models’ predictions. “As long as those [behaviors] exist and the history of them exist, you can build up the history online actually and make the recommendation,” Ege mentioned. Results are frequently improved by deploying multiple models — and algorithms — to address a particular business problem.

For use cases of InsurTech (in which quotes and varying insurance products are offered to customers in real-time after they input information online), organizations “might have multiple algorithms running underneath that fit the situation better,” Ege observed. “They all have slightly different data availability. It depends on how much history you have and the features you have. It’s different flavors of the same problem.”

Training Offline, Deploying and Scoring Online

Despite the propensity to accelerate the data science process by simultaneously training and deploying models online, there are still situations in which real-time data analysis benefits from keeping these two steps distinct. It’s not uncommon for models to be crafted and trained offline, then deployed online with real-time event data to score models — and their results — before comparing their performance to their performance offline.

One of the determinant characteristics for adopting this time-honored method pertains to the quantity and variation of data required for the model’s training. These concerns are especially relevant in cases in which “the technique or the problem needs more data than whatever is going to stream to that large model,” Ege pointed out.

By training models offline, organizations have greater latitude to inform the models’ learning with a wider selection of data and greater amounts of historic data — such as financial records for determining churn, for instance, that date back several years. The basic premise is that such models “need to be trained with enough data to capture the normal, so that you can then capture the abnormal when you deploy them,” Ege noted.

This requirement applies to certain anomaly detection applications. Once the training period for those models is completed offline, users can still score them online to monitor their performance with streaming data. Examples include “computer vision for quality control,” Ege said. “If you’re manufacturing something and there’s a crack or something, the sooner you detect it and take it off the line, the less money you lose.”

Core Value Proposition

It’s becoming fairly commonplace to employ machine learning models for real-time data analysis. Traditional data science measures for these applications entail creating models offline before inputting them into production online. As Ege revealed, there are still scenarios in which this method is advisable.

However, the ability to train models while they’re in production, while updating their features and weights based on real-time inputs, is critical for ensuring models are reacting to the most recent data available. Being able to do so is foundational to real-time data analysis’s core value proposition of acting in the moment, while also ensuring machine learning is as useful as possible for fulfilling this objective.

CelerData helps enterprises accelerate business growth with a unified analytics platform that delivers 3X the performance of any other solution on the market while reducing operating costs by up to 80%. Powered by StarRocks, CelerData is used worldwide by leading brands including Airbnb and Lenovo.
Learn More
The latest from CelerData
TRENDING STORIES
Jelani Harper has worked as a research analyst, research lead, information technology editorial consultant, and journalist for over 10 years. During that time he has helped myriad vendors and publications in the data management space strategize, develop, compose, and place...
Read more from Jelani Harper
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.