VOOZH about

URL: https://thenewstack.io/6-tips-for-better-data-science-in-the-cloud/

⇱ 6 Tips for Better Data Science in the Cloud - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2021-06-02 06:23:51
6 Tips for Better Data Science in the Cloud
contributed,sponsor-snowflake,sponsored,sponsored-post-contributed,
Cloud Native Ecosystem

6 Tips for Better Data Science in the Cloud

The cloud opens an exciting frontier to better understand customers, monetize data in new ways and make predictions about the future.
Jun 2nd, 2021 6:23am by Christian Kleinerman
👁 Featued image for: 6 Tips for Better Data Science in the Cloud
Feature image via Pixabay.
Snowflake sponsored this post.
Christian Kleinerman
Christian is a database expert with over 20 years of experience working with various database technologies, currently serving as senior vice president of product at Snowflake. Christian earned his bachelor’s degree in industrial engineering from Los Andes University, and he is a named inventor on numerous Snowflake patents.

The cloud has transformed what is possible with data science. Data teams now have access to a vast pool of elastic computing power, numerous sources of internal and external data, and managed cloud services that reduce the complexity of building, training and deploying machine learning and deep learning models at scale.

But that doesn’t mean there aren’t challenges as teams adapt from an on-premises infrastructure to a cloud-based model. Data scientists, data engineers and developers are all having to learn and adapt to a new environment, and there is an ever-expanding and rapidly evolving ecosystem of tools and frameworks from which to choose. Many are learning on the job, figuring it out as they go.

The very capabilities that make the cloud so exciting also create potential pitfalls to watch out for. The ease of copying data across diverse systems can create governance challenges if not handled properly. The speed of change means that data teams can bet on the wrong tool or framework and become stranded there. Habits and biases from the on-premises world can limit understanding of what’s possible in the cloud.

After building data management technology for many years, and from frequently talking to organizations of all sizes across all industries, I’ve seen some common pitfalls and misunderstandings that can hold data teams back from doing great work. The cloud opens an exciting frontier to better understand customers, monetize data in new ways and make predictions about the future. So I hope the following tips will allow data teams to capitalize on those benefits, while working in a way that is secure, efficient and effective.

1. Make Governance Your Top Priority

It’s critical to enable iteration and investigation without compromising governance and security. For example, many data scientists intuitively want to copy a dataset before they start working on it. But it’s too easy to make copies, move on and forget they exist, creating a nightmare in terms of compliance, security and privacy. A modern data platform should allow you to work on snapshots, or virtual copies, without needing to duplicate entire datasets, while maintaining fine-grained controls to ensure that only the right users and applications have access to it. Create processes that minimize copies and clean up anything copied; don’t be the person that gets your company in the news headlines for the wrong reasons.

2. Leave Your Preconceptions at the Door

If you’re coming from an on-premises world, you’ll often bring perceptions and biases about infrastructure that no longer apply to modern platforms in the cloud. I’ve often heard data scientists say, “I’d love to retrain my model several times a day, but it’s too slow and will delay other processes.” But that’s not an issue in a world of elastic infrastructure. Approach the cloud from first principles. Start with what you want to achieve, not what you think is possible, and move forward from there. That’s the only way to push the boundaries and take full advantage of this new environment.

3. Avoid Creating Data Silos 2.0

Closely tied to data governance is the concept of silos. In the cloud, it’s important not to replicate the fragmentation that’s common in the on-premises world.. The proliferation of tools, platforms and vendors is great for innovation, but it can also lead to redundant, inconsistent data being stored in multiple locations. Another cause of fragmentation is when structured data is stored in one environment, such as a data warehouse, while semi-structured data ends up in a data lake. Besides compromising governance and security, this fragmentation can get in the way of achieving better predictions or classifications.

Work with a cloud data platform that provides a global, consolidated view of your data. That means a platform that can accommodate structured, semi-structured and unstructured data side by side and provide a single instance across multiple cloud providers and tools — not six versions of your data replicated across different platforms and environments.

Snowflake enables every organization to mobilize their data with Snowflake’s Data Cloud. Customers use the Data Cloud to unite siloed data, discover and securely share data, power data applications, and execute diverse AI/ML and analytic workloads across multiple clouds and geographies.
Learn More
The latest from Snowflake
Hear more from our sponsor

4. Keep Your Options Open

One of the exciting things about this space is that frameworks and tools are evolving at an incredible pace, but it’s critical not to get locked into an approach that limits your options when technologies fall in and out of favor. To give one example: Spark ML used to be the answer to most large-scale training problems, but now TensorFlow and PyTorch are capturing the most attention. You never know what will happen next year, or next week for that matter. Choose a data platform that won’t tie you into one framework or one way of doing things, with an extensible architecture that can accommodate new tools and technologies as they come along.

5. Incorporate Third-Party Data Sources

The cloud makes it much easier to incorporate external data from partners and data-service providers into your models. This was particularly important over the past year, as businesses sought to understand how the impact of COVID-19, fluctuations in the economy, and subsequent changes in consumer behavior, would affect their businesses. For example, organizations used data about local infection rates, foot traffic in stores and signals from social media to predict buying patterns and forecast inventory needs. Explore the numerous data sources available and determine which can help to accurately address the questions your business needs to answer.

6. Minimize Complexity

It’s often said that when you have a hammer, everything looks like a nail, and this applies to AI technologies like machine learning and deep learning. They are immensely powerful and have a critical role to play for certain business needs, but they’re not right for every problem. Always start with the simplest option and increase complexity as needed. Try a simple linear regression, or look at averages and medians. How accurate are the predictions? Does the ROI of increasing the accuracy justify a more complex approach? Sometimes it does, but don’t jump to that option as your first instinct.

Doing advanced data analytics has never been more accessible. Data scientists, data engineers and developers are now among the most important members of any organization. The cloud is a simpler, more powerful and more dynamic place to do data analytics, and the challenges it presents are not hard to address when you’re aware of them and make the right decisions about technology and tools. But you need to be intentional and think before you dive in.

Starting on June 8, my company will kick off our virtual Summit, where you can join other data professionals to learn more about doing advanced analytics in the cloud. I hope you’ll join us there. In the meantime, enjoy your work and build great things.

Snowflake enables every organization to mobilize their data with Snowflake’s Data Cloud. Customers use the Data Cloud to unite siloed data, discover and securely share data, power data applications, and execute diverse AI/ML and analytic workloads across multiple clouds and geographies.
Learn More
The latest from Snowflake
Hear more from our sponsor
TRENDING STORIES
Christian Kleinerman is executive vice president of product at Snowflake with over 20 years of experience working with various database technologies. He has more than 15 years of management and leadership experience. At Microsoft, he served as general manager of...
Read more from Christian Kleinerman
Snowflake sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.