Data Modeling in MongoDB : A complete guide
In this article, we will learn about data modeling in MongoDB and its different use cases.
So, let's get started...
What is data modeling?
Data modeling is the process of taking unstructured data generated by a real-world scenario and then structuring it into a logical data model in a database, which can be done by using certain criteria.
Different Types of relationships:
There are mainly three types of relationships between data:
1) One to one
In one-to-one reletionship, one field can have only one value.
For example, One movie can only have one name.
2) one to many
According to mongodb reletionship, we can divide this into another three sub-reletionship as below:
i)One to Few
For Example, One movie can win many awards but don't thousands, thre will be few only
ii)One to Many
This is the most important relationship and is mostly used in mongodb.
For Example, One movie can have many reviews, like hundreds/thousands
iii)One to Ton
For Example, application logs, suppose you want to capture login activity logs of your application, there can eventually grow to millions if we have a large user base
3)Many to many
For Example, One movie can have many actors, but one actor can also play in many movies
REFERENCING VS EMBEDDING
1) Referenced /Normalized :
Here we do create two separate documents and then give the reference IDs on one document to another.
For example, We can create one movie document and another actors documents and then we can connect movies with actors by providing actors refrence in movie documents by their id, this is also called child referencing.
Pros:
π Its easier to query each document on its own
Cons:
π We need 2 queries to get data from referenced documents
2) Embbeded/DeNormalized:
Here we embed related documents directly inside documents, so in Embedded documents, we have all data within these documents, so no need to create other documents.
Pros:
πThis can improve performance as we need fewer queries to get all data.
Cons:
πWe can't get only embedded data, if is there any requirements that happens, so in this case, you have to use normalized data.
When to use Embed and Referenced:
1) Embedding:
We can use always Embeddiing while having, the following criteria:
- One to Few relationships
- Data is mostly readData
- Data does not change quickly
- High read/low write ratio
- Datasets really belong together (user + email address)
For example, Images of movies, as once its added, it's not get updated regularly
2) Referencing:
We can use Referencing while having, the following criteria:
- Always while having one to ton or One to many relationships
- Data get updated a lot
- Low read/high write ratio, For example, Movies + review as a review can be updated multiple times and also can be updated when any user likes, dislikes or marks as helpful
- We frequently need to query both datasets on their own, For example, if we need to fetch images only multiple times then we must have to use referencing
Types of Referencing:
!) Child referencing:
Here mostly we store references of other documents as an array in main parent documents
Best for :1 to Few
!!) Parent Referencing:
Now suppose in child referencing we are storing loggin info logs and that can become very large in feature and as there 16MB limit for BSON document, it can be easily over, so it's not ideal, so in this case, we can use parent referencing.
Best for : 1 to Many, 1 to ton
Here we store parent reference id in child documents.
!!!) Two-way referencing: Best for : MANY TO MANY