VOOZH about

URL: https://thenewstack.io/data-modeling-part-1-goals-and-methodology/

⇱ Data Modeling: Part 1 — Goals and Methodology - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-02-07 10:40:07
Data Modeling: Part 1 — Goals and Methodology
sponsor-influxdata,sponsored-post-contributed,
Data / Software Development

Data Modeling: Part 1 — Goals and Methodology

In different techniques, entities and relationships remain central. However, their nature and roles are reinterpreted according to the business goals.
Feb 7th, 2023 10:40am by Riccardo Tommasini
👁 Featued image for: Data Modeling: Part 1 — Goals and Methodology
InfluxData sponsored this post.

Data modeling is the process of defining and representing the data elements in a system in order to communicate connections between data points and structures. In his impactful book “Designing Data-Intensive Applications,” Martin Kleppmann describes data modeling as the most critical step in developing any information system.

Understanding which data is relevant for the business and in what form requires communication between functional and technical people. Moreover, allowing data sharing across components within an information system is critical for the good functioning of the system. Quoting Kleppmann, “data models have such a profound effect not only on how the software is written but also on how we think about the problem that we are solving.”

But what exactly is a data model, then?

A data model is a specification that describes the structure of the data stored in the system.

In addition, it may define constraints that guarantee data integrity and standardize how to represent (rules), store (format) or share (protocol) data. In the literature, we typically distinguish between three different levels of data modeling (see pyramid figure)

👁 Image

Figure 1

  • The Conceptual level defines what the system contains. Business stakeholders typically create a conceptual model. The purpose is to organize, scope and define business concepts and rules. Definitions are most important at this level, such as a product.
  • The Logical level defines how the database management system (DBMS) should be implemented. A logical model is technologically biased and is created with the purpose of developing a technical map of rules and data structures. Relationships and attributes become visible, for instance, product name and price.
  • The Physical level describes how to use a specific technology to implement the information system. The physical model is created with the purpose of implementing the database. The physical level explores the trade-offs in terms of data structures and algorithms.
InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
Learn More
The latest from InfluxData

During the Beginner Flux training at InfluxDB University, we used the same levels to understand how time series data maps onto the Flux data structure and InfluxDB’s line protocol data model. Here we take this a step further in data modeling for InfluxDB and Flux. Therefore, it is worth recalling that:

  • Conceptually, a time series is an ordered set of timestamped data points described by one — and only one — measurement and a set of tags.
  • Logically, Flux represents multiple series simultaneously, representing different values by a set of key-value pairs named fields. Moreover, tags are key-value pairs that help further partition data for processing.
  • Physically, InfluxDB stores data into a Time-Structured Merge Tree; it is also worth mentioning that tags are both key and value indexed.

A Brief History of Data Modeling Methods

Now that we clarified what a data model is and the goals of data modeling, we can discuss how we get there. In practice, several methodologies exist in the literature. The most prominent ones, listed below, differ in terms of target information systems and workloads, such asi.e. online transaction processing (OLTP) and DBMS; online analytical processing (OLAP) and data warehouse; and big data and data lakes.

  • Relational modeling (RM) focuses on removing redundant information for a model that encompasses the whole enterprise business. RM uses relations (tables) to describe domain entities and their relationships.
  • Dimensional modeling (DM) focuses on enabling complete requirement analysis while maintaining high performance when handling large and complex (analytical) queries. DM aims to optimize the data access; thus, it is tailored for OLAP. The star and snowflake models are notable results of dimensional modeling.

Notably, RM and DM produce significantly different results considering the logical and physical levels of abstraction described above. Nonetheless, they all share similar conceptualization and tooling when operating at the conceptual level. Indeed, the entity-relationship (ER) modeling technique and diagrams underpin all the models mentioned above and graph databases or semantic stores. Therefore, it is worth refreshing what ER implies:

  • An entity is an object that exists and is distinguishable from other objects. Entities have a type and descriptive attributes; an entity-set groups entities of the same type. An attribute called the primary key uniquely identifies each entity in a set.
  • A relationship is an association among several entities. The cardinality of a relationship describes the number of entities to which another entity can be associated; we consider one-to-one, one-to-many and many-to-one.
👁 Image

Figure 2

In different techniques, entities and relationships remain central. However, their nature and roles are reinterpreted according to the business goals. For example, RM stresses identifying as many entities as possible to avoid data redundancy. Indeed, redundancy creates maintenance problems over time, which oppose the user’s need for consistency.

Conversely, DM builds around facts that borrow their identity from other entities using their many-to-many relations. Such entities are interpreted as dimensions, such as, descriptive information that gives context to the facts. DM is of primary interest to data warehouse users, whose top concerns are analytics. Both the modeling techniques mentioned above can, to some extent, represent time.

  • In relational modeling, time is just an attribute. Entities and relationships can be updated, but the conceptual schema does not carry information at this level. Temporal extensions of the relational modeling approaches have been proposed. However, they are tailored for temporal databases, which focus on the temporal validity of their entities (as a form of consistency) rather than time series databases (TSDBs) and the history of their time-varying attributes.
  • In dimensional modeling, time is considered an analytical dimension — it represents a possible subject for slicing over, which produces significant aggregates. Dimensional tables within the dimensional model do not consider changes at the conceptual level. However, in lower levels, changes may happen. Different approaches to handling such “slowly changing dimensions” have been proposed, including keeping track of their history, which is close to what a TSDB would do.
InfluxData is the creator of InfluxDB, the leading time series platform. More than 1,900 customers use InfluxDB to collect, store, and analyze all time series data at any scale. Developers can query and analyze their time-stamped data to predict, respond, and adapt in real-time.
Learn More
The latest from InfluxData
TRENDING STORIES
Riccardo Tommasini is an associate professor (Maître des Conférences) at the Institut National des Sciences Appliquées de Lyon (INSA Lyon), France. Before that, Riccardo was assistant professor of data management at the University of Tartu, Estonia. Riccardo did his PhD...
Read more from Riccardo Tommasini
InfluxData sponsored this post.
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: Pragma.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Join the millions of developers using InfluxDB to predict, respond, and adapt in real-time.