- 10 GitHub Repositories for Modern Database Systems and Tools
Explore 10 top open-source GitHub repositories for modern databases, analytics, SQL, caching, monitoring, replication, PostgreSQL, SQLite, and AI agent memory.
Data Engineering
- Top 10 Python Libraries for Data Engineering in 2026
Want to level up your data engineering toolkit? Here are some Python libraries that'll make your pipelines faster, cleaner, and easier to maintain.
Data Engineering
- How to Deploy Your First App on FastAPI Cloud
Learn how to build, test, deploy, and monitor your first FastAPI Cloud app, a simple live gold and silver dashboard.
Data Engineering
- 5 Docker Best Practices for Faster Builds and Smaller Images
By applying a few smart Docker practices, you can build faster images, and keep them clean, compact, and production-ready.
Data Engineering
- Docker for Python & Data Projects: A Beginner’s Guide
Managing dependencies for Python data projects can get messy fast. Docker helps you create consistent environments you can build, share, and deploy with ease.
Data Engineering
- 5 Docker Containers for Small Business
Here are five ready-to-go Docker containers that can be deployed today to make any small business run smoother.
Data Engineering
- Supabase vs Firebase: Which Backend Is Right for Your Next App?
Compare SQL and NoSQL backend services. Find out which BaaS is right for your next app in this neutral guide.
Data Engineering
- 5 Useful Docker Containers for Agentic Developers
Build AI agents instantly with 5 ready‑to‑run Docker containers. Pull, run, and start creating with zero setup.
Data Engineering
- Building Declarative Data Pipelines with Snowflake Dynamic Tables: A Workshop Deep Dive
Traditional data pipeline development often requires extensive procedural code to define how data should be transformed and moved between stages. The declarative approach flips this paradigm by allowing data engineers to specify what the end result should be rather than prescribing every step of how to achieve it.
Data Engineering
- 5 Self-Hosted Alternatives for Data Scientists in 2026
Save money & take control in 2026. Discover 5 powerful open-source, self-hosted tools to replace costly subscriptions for data scientists.
Data Engineering
- Data Engineering for the LLM Age
Great LLMs need great data. Discover the pipelines, tools, and RAG architecture shaping the future of AI-ready data engineering
Data Engineering
- 10 Essential Docker Concepts Explained in Under 10 Minutes
Images, containers, volumes, and networks... Docker terms often sound complex to beginners. This quick guide explains Docker essentials to get started.
Data Engineering
- How to Self-Host n8n on Docker in 5 Simple Steps
This tutorial will guide you through the complete process of self-hosting n8n on Docker in just 5 simple steps, with detailed explanations and code samples, regardless of your technical background.
Data Engineering
- Top 7 Python ETL Tools for Data Engineering
Building data pipelines? These Python ETL tools will make your life easier.
Data Engineering
- 6 Docker Tricks to Simplify Your Data Science Reproducibility
Read these 6 tricks for treating your Docker container like a reproducible artifact, not a disposable wrapper.
Data Engineering
- 5 Fun Docker Projects for Absolute Beginners
Learn Docker by doing with five beginner-friendly projects covering hosting, multi-container apps, CI, and monitoring.
Data Engineering
- 5 Emerging Trends in Data Engineering for 2026
Looking ahead to 2026, the most impactful trends are not flashy frameworks but structural changes in how data pipelines are designed, owned, and operated.
Data Engineering
- Pixi: A Smarter Way to Manage Python Environments
Pixi makes python environment management simple, consistent, and portable.
Data Engineering
- 5 Practical Docker Configurations
These five configurations can turn your Docker setup from a slow chore into a finely tuned machine.
Data Engineering
- The Complete Guide to Building Data Pipelines That Don’t Break
A practical guide to building reliable data pipelines that stay up and running. Learn what breaks them and how to avoid it.
Data Engineering
- How to Build and Publish a Docker Image to Docker Hub
Build once, run anywhere — deploy your app with Docker and Docker Hub.
Data Engineering
- Shortcuts for the Long Run: Automated Workflows for Aspiring Data Engineers
Tired of repeating the same data tasks? Automate them. This article shows beginners how to build efficient, low-maintenance data engineering workflows that pay off in the long run.
Data Engineering
- Setting Up a Machine Learning Pipeline on Google Cloud Platform
Learn the steps for setting up the machine learning pipeline in the top cloud provider.
Data Engineering
- Implementing Machine Learning Pipelines with Apache Spark
Machine learning pipelines help turn data into predictions. Apache Spark makes it easy to build these pipelines for big data.
Data Engineering
- 7 Essential Ready-To-Use Data Engineering Docker Containers
Ready to level up your data engineering game without wasting hours on setup? From ingestion to orchestration, these Docker containers handle it all.
Data Engineering
- 10 GitHub Repositories to Master Cloud Computing
Learn cloud computing concepts, tools, and best practices through free, community-driven content on GitHub.
Data Engineering
- Creating a Data Science Pipeline for Real-Time Analytics Using Apache Kafka and Spark
This article explains how to create a system that processes data in real time using Apache Kafka and Spark.
Data Engineering
- How to Secure Docker Containers with Best Practices
Learn how to protect your Docker containers from vulnerabilities and security threats by following these best practices.
Data Engineering
- A Practical Guide to Modern Airflow
Most data professionals and top companies, such as Airbnb and Netflix, use Apache Airflow daily. That is why you will learn how to install and use Apache Airflow in this article.
Data Engineering
- 5 Free Data Engineering Courses
You want to learn data engineering, but don’t know where to start? Here are the suggestions of five free online courses, with some additional resources for skill practicing.
Data Engineering
- 10 Essential Docker Commands for Data Engineering
Tired of 'it works on my machine' problems? Learn the top 10 Docker commands every data engineer needs to build, deploy, and scale projects like a pro!
Data Engineering
- How to Monitor Docker Containers
This guide highlights the importance of container monitoring, key metrics to track, and tools ranging from Docker's built-in commands to comprehensive systems like Prometheus and Grafana.
Data Engineering
- Implementing Data Quality Assurance in Data Science Pipelines with Great Expectations
This article shows how to use Great Expectations to check data quality in data science projects.
Data Engineering
- Getting Started with the Data Engineer Handbook
Kickstart your data engineering career with an expert guide available on GitHub.
Data Engineering
- How to Use Docker for Local Development Environments
Learn how to create containers and manage complex setups with Docker Compose to simplify your development workflow.
Data Engineering
- How to Perform Advanced SQL Queries in BigQuery
Improve your SQL querying skills in BigQuery with these advanced querying templates.
Data Engineering
- 7 Projects to Master Data Engineering
Learn to build, run, and manage data engineering pipelines both locally and in the cloud using popular tools.
Data Engineering
- Developing Robust ETL Pipelines for Data Science Projects
In this article, we’ll look at how to build ETL pipelines for data science projects.
Data Engineering
- Beginner’s Guide to FastAPI
FastApi is a contemporary web framework designed for creating RESTful APIs with Python 3.8 or later.
Data Engineering
- 7 Data Engineering Tools for Beginners
Learn the data engineering tools for data orchestration, database management, batch processing, ETL (Extract, Transform, Load), data transformation, data visualization, and data streaming.
Data Engineering
- How to Write Basic SQL Queries in BigQuery
Take the first steps in writing effective SQL queries to retrieve data in BigQuery
Data Engineering
- How to Import Data into BigQuery
Master the process of loading datasets into BigQuery from four different data sources
Data Engineering
- How to Set Up Your First BigQuery Project
Discover BigQuery: Google's structured data warehouse in the cloud, and take your first learning steps with this enthralling technology.
Data Engineering
- A Beginner’s Guide to ClickHouse Database
Learn how to install ClickHouse DBMS, create a database, and run SQL queries using native and Python clients.
Data Engineering
- Project Ideas to Master Data Engineering
Data engineering is best learned by doing projects. But which ones? Here are six projects focusing on different data engineering skills to ensure you have it all covered.
Data Engineering
- Building Data Pipeline with Prefect
Learn how to build and deploy an end-to-end data pipeline using Prefect with a few lines of code.
Data Engineering
- How To Use Docker Volumes for Persistent Data Storage
Learn how to use Docker volumes to ensure data persistence when working with Docker.
Data Engineering
- Landing a Data Engineer Role: Free Courses and Certifications
Is it possible to learn data engineering for free? I claim it is and present the evidence for that in the form of 10 free data engineering courses.
Data Engineering
- How To Debug Running Docker Containers
Debugging Docker containers is an essential skill when working with containerized applications. Let’s explore the different ways to debug Docker containers.
Data Engineering
- How To Use Docker Tags to Manage Image Versions Effectively
Learn to use Docker tags for managing and versioning docker images, making it easier to handle different application versions.
Data Engineering
- How To Optimize Dockerfile Instructions for Faster Build Times
Optimize Dockerfiles for faster builds by using build cache, minimizing build context, and following best practices.
Data Engineering
- How To Leverage Docker Cache for Optimizing Build Speeds
Want to make your Docker builds much faster? Learn how to do so by leveraging Docker's layer caching mechanism.
Data Engineering
- How To Create Minimal Docker Images for Python Applications
This tutorial will teach you how to create minimal Docker images for Python applications.
Data Engineering
- 10 GitHub Repositories to Master Data Engineering
Learn data engineering through free courses, tutorials, books, tools, guides, roadmaps, practice exercises, projects, and other resources.
Data Engineering
- 7 Steps to Mastering Data Engineering
The only data engineering roadmap you need for an introduction to concepts, tools, and techniques to collect, store, transform, analyze, and model data.
Data Engineering
- What is a Database? Everything You Need to Know
Unlocking Database Basics.
Data Engineering
- 5 Airflow Alternatives for Data Orchestration
Top list of open-source tools for building and managing workflows.
Data Engineering
- What Is Data Lineage, And Why Does It Matter?
If you’ve ever had conversations with data professionals, you’ve probably heard “data lineage” pop up quite a few times. So what is data lineage all about, and why is it important?
Data Engineering
- Free Data Engineering Course for Beginners
Interested in data engineering but don't know where to start? Get up to speed in data engineering fundamentals with this free course.
Data Engineering
- A Data Lake, You Call It? It’s a Data Swamp
How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.
Data Engineering
- The Only Free Course You Need To Become a Professional Data Engineer
Data Engineering ZoomCamp offers free access to reading materials, video tutorials, assignments, homeworks, projects, and workshops.
Data Engineering
- Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck
Bring the powerful tools to your laptop.
Data Engineering
- Evolution in ETL: How Skipping Transformation Enhances Data Management
This article provides an overview of two new data preparation techniques that enable data democratization while minimizing transformation burdens.
Data Engineering
- Back to Basics Bonus Week: Deploying to the Cloud
Welcome back to the KDnuggets’ "Back to Basics" series. This is the BONUS week and we will dive into learning about deploying to the cloud.
Data Engineering
- 5 Free Courses to Master Data Engineering
Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.
Data Engineering
- How Big Data Is Saving Lives in Real Time: IoV Data Analytics Helps Prevent Accidents
This posts talks about what needs to be taken care of in IoV data analysis, and shows the difference between a near real-time analytic platform and an actual real-time analytic platform with a real-world example.
Data Engineering
- Getting Started with Graph Database Queries, with Cheat Sheet!
Graph databases are quickly becoming a core part of the analytics toolset for enterprise IT organizations. If you know SQL, you can easily learn Cypher and open up a huge opportunity for data analysis.
Data Engineering
- Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?
A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.
Data Engineering
- 7 Best Cloud Database Platforms
Cloud databases have made it easier and cheaper to develop enterprise-level applications, offering flexibility, convenience, and standard database functionality. See what KDnuggets recommends.
Data Engineering, KDnuggets Recommends
- Exploring Data Mesh: A Paradigm Shift in Data Architecture
Let’s explore Data Mesh, a modern approach to data architecture that decentralizes data ownership and management.
Data Engineering
- Best Practices for Building ETLs for ML
This article talks about several best practices for writing ETLs for building training datasets. It delves into several software engineering techniques and patterns applied to ML.
Data Engineering
- Getting Started with Google Cloud Platform in 5 Steps
Explore the essentials of Google Cloud Platform for data science and ML, from account setup to model deployment, with hands-on project examples.
Data Engineering
- A Comprehensive Guide to Pinecone Vector Databases
This blog discusses vector databases, specifically pinecone vector databases. A vector database is a type of database that stores data as mathematical vectors, which represent features or attributes. These vectors have multiple dimensions, capturing complex data relationships. This allows for efficient similarity and distance calculations, making it useful for tasks like machine learning, data analysis, and recommendation systems.
Data Engineering
- Working with Big Data: Tools and Techniques
Where do you start in a field as vast as big data? Which tools and techniques to use? We explore this and talk about the most common tools in big data.
Data Engineering
- Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave
Build a streaming data pipeline using Formula 1 data, Python, Kafka, RisingWave as the streaming database, and visualize all the real-time data in Grafana.
Data Engineering
- How to Digest 15 Billion Logs Per Day and Keep Big Queries Within 1 Second
This article describes a large-scale data warehousing use case to provide reference for data engineers who are looking for log analytic solutions. It introduces the log processing architecture and real-case practice in data ingestion, storage, and queries.
Data Engineering
- 2024 Data Management Crystal Ball: Top 4 Emerging Trends
These are my predictions based on my personal experiences, recent research and reports from leading platforms.
Data Engineering
- Creating A Simple Docker Data Science Image
This concise primer walks through setting up a Python data science environment using Docker, covering creating a Dockerfile, building an image, running a container, sharing and deploying images, and pushing to Docker Hub.
Data Engineering
- Things You Should Know When Scaling Your Web Data-Driven Product
Scaling your data-driven product helps grow your business, but it requires certain expertise. In this article, you will learn how scaling works and what to keep in mind while doing it.
Data Engineering
- How to Build a Real-Time Recommendation Engine Using Graph Databases
"You may also like" is a simple phrase that implies a new era in the way businesses interact and connect with their customers, and graph databases can easily help to build recommendation engines.
Data Engineering
- Top 6 Tools to Improve Your Productivity on Snowflake
The post reviews 6 top tools for improving productivity with Snowflake for data preparation, visualization, integration, BI and governance.
Data Engineering
- CDC Data Replication: Techniques, Tradeoffs, Insights
The author discusses common use cases for CDC data replication, implementation techniques and their tradeoffs, and firsthand insights.
Data Engineering
- A Beginner’s Guide to Data Engineering
So you want to break into data engineering? Start today by learning more about data engineering and the fundamental concepts.
Data Engineering
- How to Build a Streaming Semi-structured Analytics Platform on Snowflake
Building a datalake for semi-structured data or json has always been challenging. Imagine if the json documents are streaming or continuously flowing from healthcare vendors then we need a robust modern architecture that can deal with such a high volume. At the same time analytics layer also needs to be created so as to generate value from it.
Data Engineering
- Evolution of the Data Landscape
The article follows the story of evolution in the data space through the lens of evolutionary patterns. It talks of the state of significant milestones in the evolutionary journey, their achievements, challenges, and the next milestone that solved those challenges. The article comes from both a business and technical perspective, owing to the persona of the authors.
Data Engineering
- Data Engineering Landscape in the AI-Driven World
Generative AI has just started to capture the imagination of data engineers, so the impact thus far has been just a fraction of what it will be a year or two from now.
Data Engineering
- Should You Consider a DataOps Career?
Transitioning your career to DataOps could be just the change you need - not only will it provide the possibility to expand your technical skills, but also a rewarding salary with many job openings.
Data Engineering
- Schedule & Run ETLs with Jupysql and GitHub Actions
This blog provided you with a comprehensive overview of ETL and JupySQL, including a brief introduction to ETLs and JupySQL. We also demonstrated how to schedule an example ETL notebook via GitHub actions, which allows you to automate the process of executing ETLs and JupySQL from Jupyter.
Data Engineering
- 11 Best Practices of Cloud and Data Migration to AWS Cloud
list of Best Practices compiled from our learnings during our migration journey to the AWS cloud.
Data Engineering
- How to Build a Scalable Data Architecture with Apache Kafka
Learn about Apache Kafka architecture and its implementation using a real-world use case of a taxi booking app.
Data Engineering
- ETL vs ELT: Which One is Right for Your Data Pipeline?
Learn about the differences between ETL and ELT data integration techniques and determine which is right for your data pipeline.
Data Engineering
- Data Quality Dimensions: Assuring Your Data Quality with Great Expectations
This article highlights the significance of ensuring high-quality data and presents six key dimensions for measuring it. These dimensions include Completeness, Consistency, Integrity, Timelessness, Uniqueness, and Validity.
Data Engineering
- A List of 7 Best Data Modeling Tools for 2023
Learn about data modeling tools to create, design and manage data models, allowing data scientists to access and use them more quickly.
Data Engineering
- Data Warehousing and ETL Best Practices
How you can improve your data warehousing ETL process with these simple practices.
Data Engineering
- 5 SQL Visualization Tools for Data Engineers👁 KDnuggets Top Blog
This article will discuss SQL visualization, its role in augmenting the modern-day data engineer, and five categories of SQL visualization tools.
Data Engineering
- Docker for Data Science Cheat Sheet
Docker is dependency management on steroids, helping to ensure both reproducibility and collaboration, making it an important tool for data science. Our latest cheat sheet serves as a handy Docker reference. Check it out now!
Data Engineering
- Learn Data Engineering From These GitHub Repositories👁 KDnuggets Top Blog
Kickstart your Data Engineering career with these curated GitHub repositories.
Data Engineering
- Tapping into the Potential of Data Products in 2023
Learn how data can be treated as a product and how it can be used to derive value.
Data Engineering
- Scaling Data Management Through Apache Gobblin
Software companies can manage big data at a hyper-scale on different infrastructure stacks using Apache Gobblin.
Data Engineering
- SQL and Data Integration: ETL and ELT
In this article, we will discuss use cases and methods for using ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes along with SQL to integrate data from various sources.
Data Engineering
- Data Lakes and SQL: A Match Made in Data Heaven
In this article, we will discuss the benefits of using SQL with a data lake and how it can help organizations unlock the full potential of their data.
Data Engineering
- Overcome Your Data Quality Issues with Great Expectations
Bad data costs organizations money, reputation, and time. Hence it is very important to monitor and validate data quality continuously.
Data Engineering
- Where Collaboration Fails Around Data (And 4 Tips for Fixing It)
Data-driven organizations require complex collaboration between data teams and business stakeholders. Here are 4 proactive tips for reducing information asymmetries and achieving better collaboration.
Data Engineering
- 7 Essential Cheat Sheets for Data Engineering👁 KDnuggets Top Blog
Learn about the data life cycle, PySpark, dbt, Kafka, BigQuery, Airflow, and Docker.
Data Engineering
- The Complete Data Engineering Study Roadmap👁 KDnuggets Top Blog
Everything you need to know to start your career in Data Engineering.
Data Engineering
- Is OLAP Dead?
OLAP enables citizen analysts to quickly, efficiently, and cost-effectively uncover new business insights at a reduced time-to-value.
Data Engineering
- Essential Books You Need to Become a Data Engineer👁 KDnuggets Top Blog
In this article, I will go through the roadmap of books you need to become a Data Engineer.
Data Engineering
- 11 Questions About Data Engineers: What’s the profession about, and where’s it heading?
I hope my answers will be useful to novice data engineers and anyone interested in data engineering.
Data Engineering
- The Evolution of Apache Druid
And so true to the origins of its name, Apache Druid is shapeshifting - with the addition of a new multi-stage query engine.
Data Engineering
- 10 Modern Data Engineering Tools
Learn about the modern tools for data orchestration, data storage, analytical engineering, batch processing, and data streaming.
Data Engineering
- Free Data Engineering Courses
Get into the highly in-demand world of data engineering for free and earn 6 figures salary.
Data Engineering
- Deploying a Streamlit WebApp to Heroku using DAGsHub
Transform your machine learning models into a web app and share them with your friends and colleagues.
Data Engineering
- Is the Modern Data Stack Leaving You Behind?
The modern data stack narrative is largely dominated by analytics engineering. Where does that leave data engineers? Discover the difference between the MDS for data engineers & analytics engineers.
Analytics, Data Engineer, Data Engineering, Tools
- Data Engineering Technologies 2021
Emerging technologies supporting the field of data engineering are growing at a rapid clip. This curated list includes the most important offerings available in 2021.
Abacus.ai, Dask, Data Engineering, Databricks, Dataiku, DataRobot, dbt, Fivetran, Pachyderm
- 👁 Rewards Blog
👁 Gold Blog
Data Scientists Without Data Engineering Skills Will Face the Harsh Truth
Although the role of the data scientist is still evolving, data remains at its core. Setting the right expectations for what you will do as a data scientist is important, and, to be sure, knowing the tools of data engineering will get yourself ready for the real world.
Data Engineering, Data Science Skills, Data Scientist
- The Most Important Tool for Data Engineers
And it has nothing to do with Python or SQL
Career Advice, Data Engineer, Data Engineering
- Model Drift in Machine Learning – How To Handle It In Big Data
Rendezvous Architecture helps you run and choose outputs from a Champion model and many Challenger models running in parallel without many overheads. The original approach works well for smaller data sets, so how can this idea adapt to big data pipelines?
Big Data, Data Engineering, Data Preparation, Machine Learning, Model Drift
- Development & Testing of ETL Pipelines for AWS Locally
Typically, development and testing ETL pipelines is done on real environment/clusters which is time consuming to setup & requires maintenance. This article focuses on the development and testing of ETL pipelines locally with the help of Docker & LocalStack. The solution gives flexibility to test in a local environment without setting up any services on the cloud.
AWS, Data Engineering, ETL, Pipeline
- dbt for Data Transformation – Hands-on Tutorial
The data build tool (dbt) is gaining in popularity and use, and this hands-on tutorial covers creating complex models, using variables and functions, running tests, generating docs, and many more features.
Data Engineering, Data Preparation, dbt, ETL, SQL
- MLOps is an Engineering Discipline: A Beginner’s Overview
MLOps = ML + DEV + OPS. MLOps is the idea of combining the long-established practice of DevOps with the emerging field of Machine Learning.
Data Engineering, Deployment, Machine Learning, MLOps, Modeling
- 👁 Gold Blog
Analytics Engineering Everywhere
Many new roles have appeared in the data world ever since the rise of the Data Scientist took the spotlight several years ago. Now, there is a new core player ready to take center stage, and we may see in five years, nearly every organization will have an Analytics Engineering team.
Analytics, Analytics Engineering, Data Engineering, dbt
- DataOps: 5 things that you need to know
DataOps (Data Operations) has assumed a critical role in the age of big data to drive definitive impact on business outcomes. This process-oriented and agile methodology synergizes the components of DevOps and the capabilities of data engineers and data scientists to support data-focused workloads in enterprises. Here is a detailed look at DataOps.
Data Engineer, Data Engineering, DataOps
- 👁 Silver Blog
Why You Should Consider Being a Data Engineer Instead of a Data Scientist
A new king of the jungle has emerged.
Career Advice, Data Engineer, Data Engineering, Data Science, Data Scientist
- Data careers are NOT one-size fits all! Tips for uncovering your ideal role in the data space
Thriving as a data professional is about more than just making good money! It’s about FULFILLMENT & IMPACT. In this article, I will help you discover the BEST data role for you given your unique skill sets, personality & goals.
Career Advice, Careers, Data Engineering, Data Science
- How to build a DAG Factory on Airflow
A guide to building efficient DAGs with half of the code.
Data Engineering, Data Workflow, Graphs, Python, Workflow
- Introducing dbt, the ETL and ELT Disrupter
Moving and processing data is happening 24/7/365 world-wide at massive scales that only get larger by the hour. Tools exist to introduce efficiencies in how data can be extracted from sources, transformed through calculations, and loaded into target data repositories. However, on their own, these tools can introduce some restrictions in the processing, especially for the needs of data analytics and data science.
Data Engineering, Data Preparation, dbt, ELT, ETL
- 👁 Gold Blog
Data Science Learning Roadmap for 2021
Venturing into the world of Data Science is an exciting, interesting, and rewarding path to consider. There is a great deal to master, and this self-learning recommendation plan will guide you toward establishing a solid understanding of all that is foundational to data science as well as a solid portfolio to showcase your developed expertise.
Data Engineering, Data Preparation, Data Science, Data Science Education, Python, Roadmap, SQL
- Data Observability, Part II: How to Build Your Own Data Quality Monitors Using SQL
Using schema and lineage to understand the root cause of your data anomalies.
Data Engineering, Data Quality, Data Science, Data Science Platform, SQL
- Feature Store as a Foundation for Machine Learning
With so many organizations now taking the leap into building production-level machine learning models, many lessons learned are coming to light about the supporting infrastructure. For a variety of important types of use cases, maintaining a centralized feature store is essential for higher ROI and faster delivery to market. In this review, the current feature store landscape is described, and you can learn how to architect one into your MLOps pipeline.
Data Engineering, Data Infrastructure, Data Lake, Feature Engineering, Feature Store, Machine Learning, Metadata, MLOps, Pipeline
- Data Observability: Building Data Quality Monitors Using SQL
To trigger an alert when data breaks, data teams can leverage a tried and true tactic from our friends in software engineering: monitoring and observability. In this article, we walk through how you can create your own data quality monitors for freshness and distribution from scratch using SQL.
Data Engineering, Data Quality, Data Science, Data Science Platform, SQL