When your daily life regularly has databases in it, then the chances are quite high that you’ve already heard the term quite a few times, data modeling. But what is it exactly? And why should you do it?
What is data modeling?
Data modeling is often considered a necessary evil, a form of documentation, which is pushed to the end of the project, only to be ignored and forgotten, due to time or budget constraints, or both. But is that correct? Is it necessary? Is it evil? And is it only documentation, which can be made afterwards?
Let’s take a look at what Wikipedia says about Data Modeling:
“Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques”
Or, if we consider Data Modeling as “creating a data model“, then what is a data model exactly? Again, according to Wikipedia, the definition of a data model:
“A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.”
This does describe a data model, or data modeling quite accurate, but doesn’t really give much more meaning to it. Is it just another document that needs to be produced? No, it is much more!
We, humans, are visual creatures. We use our eyesight to observe the world around us, and use visuals to remember things. The visual is not our only means of registering and remembering the world, but it is an important one. Words, by means of reading or hearing, are other ones, but it is much harder to describe something in words, than it is to create an image.
There are many examples of this. For instance, when you read a book, you create a visual representation of what you are reading. When you see a movie based on that book, it will nearly always be different from how you imagined it. And even a thousand different movies can be made from this book, but they will never come close to how you visualized the book.
If I tell you to imagine a beautiful place on earth, many people will think of similar places, but no two people will visualize the same place in the same way. Even if I start describing the place, and tell you about the water flowing between the trees covered in colorful fall foliage, and the blue sky with the fluffy clouds, reflected in the water. The outcome of this exercise will render different results based on your experiences and more.
But, when I show you a picture of this setting, there is no (or way much less) discussion about what the scene looks like. If you add spoken or written words to this image, you can reproduce the visual of the scene as close as possible.
This is what a data model does, it is a representation of your data on different levels. Combine the visual representation of the data with the written documentation, and you have a decent start of your analysis, and a good chance of being able to recreate reality, the data flows or the database.
Types of data models
Data models come in different sizes and flavors, but there are some types that can be distinguished:
The Conceptual Data Model:
The conceptual data model describes the data and data flows on a high level, in a specific domain or area of interest. This domain can be the sales division of a company, the students in a school, fish in the sea, or even a whole enterprise.
In a conceptual data model, significant items in the domain are represented by entities or entity classes. For instance, shops, points of sale, classes. Then the relations between the different entities are defined, creating a clear view of the sources of the different data and the flows and the relations of data.
Conceptual data models use non-technical names and terms, and more business – related names, so that management on all levels can read and understand the model. Conceptual data models are typically created by architects.
The Logical Data Model:
The logical data model is derived from the conceptual data model, and describes and groups the data in entities and attributes, or tables and columns, or object-oriented classes. It provides a clear overview of how your data is stored and grouped. Depending on the need of the data model, this can be according to different methodologies, such as the star schema for Kimball based Data Warehouses, or a 3rd Normal Form normalized schema for operational systems, or Data Vault for a data lake.
Logical data models are mostly designed using an ERD, and entity-relationship-diagram, which clearly visualizes the different data entities and their relations. Logical data models should be technology-agnostic, meaning that they should be generic, and not include technical implications. When interpreting a Logical Data Model (LDM), you should not be able to derive the technical implementation on which the system will be running.
Logical Data Models are typically created by analysts, and still use business names and terms instead of technical names, but on a more specific level.
The Physical Data Model:
The physical data model is derived from the Logical Data Model (LDM), and detailing the data model to the physical implementation of storing the data, for instance in a database. This means that the physical data model (PDM) will include a granularity and clarity that allows for creating a database from it. It will contain, for instance, partitions, tablespaces, indexes, keys and more.
The physical data model is mostly created by a technically skilled person, and will use the technical names and terms as used in the DBMS that will be used. These names are then often subject to the technical limitations of the platform.
Why should I data model?
Higher quality through better understanding.
Designing your way through the different levels of data models will increase your understanding of the data and how it is created and flowing. This will result in a better data model, and thus a better implementation of the database.
Reducing project cost and time
Data modeling does cost time, and thus it is typically ignored or minimized to save costs. Yet, that is a wrong idea, and it should be considered an investment. Why? Because a well-designed data model will easily reduce your programming and testing costs by 50%, due to the ability to catch design errors and flaws at an early stage, preventing bug fixing afterwards, or even worse, expensive design changes when in production.
This already results in less project time and cost, but having a good data model method often gives the opportunity to automate tasks further down the workflow, such as automatic deduction of the PDM from the LDM, and automated DDL (Data Definition Language, SQL for instance) generation from the PDM, ready to run on your DBMS.
Besides these already big advantages, the same reduction in cost and time happens when you need to maintain or expand your system.
It is still considered a dirty word by many developers. But in practice, have you ever looked back at code or a database that you created several years ago? Can you honestly say that you still know what is what and why things were created the way they are?
Then consider a colleague creating the database, without decent documentation and a good data model. And then he leaves your company, and a new person comes in. Think of how much easier it will be for a newcomer to dive into the system and comprehend the what and why.
A good data model will visualize the complexity of a system, allowing you to get a clear view on the development needed, and the risks a project will have, and even better, to avoid many of these larger risks and errors that are otherwise bound to be made.
A well-designed database runs many times more efficient than any randomly created database. A good data model takes into account the natural flow of the data, but also the usage of the system for which it is being designed, the way the data will enter the system, but also the way the data will be retrieved from the system. Keeping all these requirements as clear objectives during the design phases of the data model, will result in a database that runs at top speed for what it is supposed to do.
Scoping and Agreeing
Data models on the different levels create a clear vision for people to work with, and to agree upon. The different data models should be clear for the business, management, analysts and developers to see what is expected, what it looks like, what is in scope, and what is not.
Data models can help you visualize your data, and get a feel for the data and how it flows and grouped. It’s a very useful tool when working with data and databases, and will lead your project to success!