![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Data modeling is the process of defining and representing the data elements in a system in order to communicate connections between data points and structures. In his impactful book “Designing Data-Intensive Applications,” Martin Kleppmann describes data modeling as the most critical step in developing any information system.
Understanding which data is relevant for the business and in what form requires communication between functional and technical people. Moreover, allowing data sharing across components within an information system is critical for the good functioning of the system. Quoting Kleppmann, “data models have such a profound effect not only on how the software is written but also on how we think about the problem that we are solving.”
But what exactly is a data model, then?
A data model is a specification that describes the structure of the data stored in the system.
In addition, it may define constraints that guarantee data integrity and standardize how to represent (rules), store (format) or share (protocol) data. In the literature, we typically distinguish between three different levels of data modeling (see pyramid figure)
During the Beginner Flux training at InfluxDB University, we used the same levels to understand how time series data maps onto the Flux data structure and InfluxDB’s line protocol data model. Here we take this a step further in data modeling for InfluxDB and Flux. Therefore, it is worth recalling that:
Now that we clarified what a data model is and the goals of data modeling, we can discuss how we get there. In practice, several methodologies exist in the literature. The most prominent ones, listed below, differ in terms of target information systems and workloads, such asi.e. online transaction processing (OLTP) and DBMS; online analytical processing (OLAP) and data warehouse; and big data and data lakes.
Notably, RM and DM produce significantly different results considering the logical and physical levels of abstraction described above. Nonetheless, they all share similar conceptualization and tooling when operating at the conceptual level. Indeed, the entity-relationship (ER) modeling technique and diagrams underpin all the models mentioned above and graph databases or semantic stores. Therefore, it is worth refreshing what ER implies:
In different techniques, entities and relationships remain central. However, their nature and roles are reinterpreted according to the business goals. For example, RM stresses identifying as many entities as possible to avoid data redundancy. Indeed, redundancy creates maintenance problems over time, which oppose the user’s need for consistency.
Conversely, DM builds around facts that borrow their identity from other entities using their many-to-many relations. Such entities are interpreted as dimensions, such as, descriptive information that gives context to the facts. DM is of primary interest to data warehouse users, whose top concerns are analytics. Both the modeling techniques mentioned above can, to some extent, represent time.