3 Things to Know When You Switch from SQL to MongoDB

Welcome to the final post in my series on moving from SQL to MongoDB. In the first post, I mapped terms and concepts from SQL to MongoDB. In the second post, I discussed the top four reasons why you should use MongoDB.

Now that we have an understanding of the terminology as well as why MongoDB is worth the effort of changing your mindset, let's talk about three key ways you need to change your mindset.

Your first instinct might be to convert your existing columns and rows to fields and documents and stick with your old ways of modeling data. We've found that people who try to use MongoDB in the same way that they use a relational database struggle and sometimes fail.

We don't want that to happen to you.

Let's discuss three key ways to change your mindset as you move from SQL to MongoDB.

Embrace Document Diversity
Data That is Accessed Together Should be Stored Together
Tread Carefully with Transactions

This article is based on a presentation I gave at MongoDB World and MongoDB.local Houston entitled "From SQL to NoSQL: Changing Your Mindset."
If you prefer videos over articles, check out the recording. Slides are available here.

#Embrace Document Diversity

As we saw in the first post in this series when we modeled documents for Leslie, Ron, and Lauren, not all documents in a collection need to have the same fields.

Users

1 {
2    "_id": 1,
3    "first_name": "Leslie",
4    "last_name": "Yepp",
5    "cell": "8125552344",
6    "city": "Pawnee",
7    "location": [ -86.536632, 39.170344 ],
8    "hobbies": ["scrapbooking", "eating waffles", "working"],
9    "jobHistory": [
10       {
11          "title": "Deputy Director",
12          "yearStarted": 2004
13       },
14       {
15          "title": "City Councillor",
16          "yearStarted": 2012
17       },
18       {
19          "title": "Director, National Parks Service, Midwest Branch",
20          "yearStarted": 2014
21       }
22    ]
23 },
24 
25 {
26    "_id": 2,
27    "first_name": "Ron",
28    "last_name": "Swandaughter",
29    "cell": "8125559347",
30    "city": "Pawnee",
31    "hobbies": ["woodworking", "fishing"],
32    "jobHistory": [
33       {
34          "title": "Director",
35          "yearStarted": 2002
36       },
37       {
38          "title": "CEO, Kinda Good Building Company",
39          "yearStarted": 2014
40       },
41       {
42          "title": "Superintendent, Pawnee National Park",
43          "yearStarted": 2018
44       }
45    ]
46 },
47 
48 {
49    "_id": 3,
50    "first_name": "Lauren",
51    "last_name": "Burhug",
52    "city": "Pawnee",
53    "hobbies": ["soccer"],
54    "school": "Pawnee Elementary"
55 }

For those of us with SQL backgrounds, this is going to feel uncomfortable and probably a little odd at first. I promise it will be ok. Embrace document diversity. It gives us so much flexibility and power to model our data.

In fact, MongoDB has a data modeling pattern specifically for when your documents do not have the same fields. It's called the Polymorphic Pattern. We use the Polymorphic Pattern when documents in a collection are of similar but not identical structures.

Let's take a look at an example that builds on the Polymorphic Pattern. Let's say we decided to keep a list of each user's social media followers inside of each User document. Lauren and Leslie don't have very many followers, so we could easily list their followers in their documents. For example, Lauren's document might look something like this:

1 {
2    "_id": 3,
3    "first_name": "Lauren",
4    "last_name": "Burhug",
5    "city": "Pawnee",
6    "hobbies": ["soccer"],
7    "school": "Pawnee Elementary",
8    "followers": [
9          "Brandon",
10          "Wesley",
11          "Ciara",
12          ...
13    ]
14 }

This approach would likely work for most of our users. However, since Ron built a chair that appeared in the very popular Bloosh Magazine, Ron has millions of followers. If we try to list all of his followers in his User document, it may exceed the 16 megabyte document size limit. The question arises: do we want to optimize our document model for the typical use case where a user has a few hundred followers or the outlier use case where a user has millions of followers?

We can utilize the Outlier Pattern to solve this problem. The Outlier Pattern allows us to model our data for the typical use case but still handle outlier use cases.

We can begin modeling Ron's document just like Lauren's and include a list of followers. When we begin to approach the document size limit, we can add a new has_extras field to Ron's document. (The field can be named anything we'd like.)

1 {
2    "_id": 2,
3    "first_name": "Ron",
4    "last_name": "Swandaughter",
5    "cell": "8125559347",
6    "city": "Pawnee",
7    "hobbies": ["woodworking", "fishing"],
8    "jobHistory": [
9       {
10          "title": "Director",
11          "yearStarted": 2002
12       },
13       ...
14    ],
15    "followers": [
16       "Leslie",
17       "Donna",
18       "Tom"
19       ...
20    ],
21    "has_extras": true
22 }

Then we can create a new document where we will store the rest of Ron's followers.

1 {
2    "_id": 2.1,
3    "followers": [
4       "Jerry",
5       "Ann",
6       "Ben"
7       ...
8    ],
9    "is_overflow": true
10 }

If Ron continues to gain more followers, we could create another overflow document for him.

The great thing about the Outlier Pattern is that we are optimizing for the typical use case but we have the flexibility to handle outliers.

So, embrace document diversity. Resist the urge to force all of your documents to have identical structures just because it's what you've always done.

For more on MongoDB data modeling design patterns, see Building with Patterns: A Summary and the free MongoDB University Course M320: Data Modeling.

#Data That is Accessed Together Should be Stored Together

If you have experience with SQL databases, someone probably drilled into your head that you should normalize your data. Normalization is considered good because it prevents data duplication. Let's take a step back and examine the motivation for database normalization.

When relational databases became popular, disk space was extremely expensive. Financially, it made sense to normalize data and save disk space. Take a look at the chart below that shows the cost per megabyte over time.

The cost has drastically gone down. Our phones, tablets, laptops, and flash drives have more storage capacity today than they did even five to ten years ago for a fraction of the cost. When was the last time you deleted a photo? I can't remember when I did. I keep even the really horribly unflattering photos. And I currently backup all of my photos on two external hard drives and multiple cloud services. Storage is so cheap.

Storage has become so cheap that we've seen a shift in the cost of software development. Thirty to forty years ago storage was a huge cost in software development and developers were relatively cheap. Today, the costs have flipped: storage is a small cost of software development and developers are expensive.

Instead of optimizing for storage, we need to optimize for developers' time and productivity.

As a developer, I like this shift. I want to be able to focus on implementing business logic and iterate quickly. Those are the things that matter to the business and move developers' careers forward. I don't want to be dragged down by data storage specifics.

Think back to the example in the previous post where I coded retrieving and updating a user's profile information. Even in that simple example, I was able to write fewer lines of code and move quicker when I used MongoDB.

So, optimize your data model for developer productivity and query optimization. Resist the urge to normalize your data for the sake of normalizing your data.

Data that is accessed together should be stored together. If you end up repeating data in your database, that's ok—especially if you won't be updating the data very often.

#Tread Carefully with Transactions

We discussed in a previous post that MongoDB supports transactions. The MongoDB engineering team did an amazing job of implementing transactions. They work so well!

But here's the thing. Relying on transactions is a bad design smell.

Ben is shocked by the horrendous smell of Tom's cologne

Why? This builds on our first two points in this section.

First, not all documents need to have the same fields. Perhaps you're breaking up data between multiple collections because it's not all of identical structure. If that's the only reason you've broken the data up, you can probably put it back together in a single collection.

Second, data that is accessed together should be stored together. If you're following this principle, you won't need to use transactions.

Some use cases call for transactions. Most do not. If you find yourself frequently using transactions, take a look at your data model and consider if you need to restructure it.

For more information on transactions and when they should be used, see the MongoDB MongoDB Multi-Document ACID Transactions Whitepaper.

#Wrap Up

Today we discussed the three things you need to know as you move from SQL to MongoDB:

Embrace Document Diversity
Data That is Accessed Together Should be Stored Together
Tread Carefully with Transactions

I hope you enjoy using MongoDB! If you want to jump in and start coding, my teammates and I have written Quick Start Tutorials for a variety of programming languages. I also highly recommend the free courses on MongoDB University.

When you're ready to take MongoDB for a spin, check out MongoDB Atlas, MongoDB's fully managed database-as-a-service. Atlas is the easiest way to get started with MongoDB. With a forever-free tier and promo code LAUREN200 for when you're ready to move beyond the free tier, you're on your way to realizing the full value of MongoDB.

In summary, don't be like Ron. (I mean, don't be like him in this particular case, because Ron is amazing.)

Change your mindset and get the full value of MongoDB.

Ron dances happily with a tiny black hat attached to his head

1	{
2	"_id": 1,
3	"first_name": "Leslie",
4	"last_name": "Yepp",
5	"cell": "8125552344",
6	"city": "Pawnee",
7	"location": [ -86.536632, 39.170344 ],
8	"hobbies": ["scrapbooking", "eating waffles", "working"],
9	"jobHistory": [
10	{
11	"title": "Deputy Director",
12	"yearStarted": 2004
13	},
14	{
15	"title": "City Councillor",
16	"yearStarted": 2012
17	},
18	{
19	"title": "Director, National Parks Service, Midwest Branch",
20	"yearStarted": 2014
21	}
22	]
23	},
24
25	{
26	"_id": 2,
27	"first_name": "Ron",
28	"last_name": "Swandaughter",
29	"cell": "8125559347",
30	"city": "Pawnee",
31	"hobbies": ["woodworking", "fishing"],
32	"jobHistory": [
33	{
34	"title": "Director",
35	"yearStarted": 2002
36	},
37	{
38	"title": "CEO, Kinda Good Building Company",
39	"yearStarted": 2014
40	},
41	{
42	"title": "Superintendent, Pawnee National Park",
43	"yearStarted": 2018
44	}
45	]
46	},
47
48	{
49	"_id": 3,
50	"first_name": "Lauren",
51	"last_name": "Burhug",
52	"city": "Pawnee",
53	"hobbies": ["soccer"],
54	"school": "Pawnee Elementary"
55	}

1	{
2	"_id": 2.1,
3	"followers": [
4	"Jerry",
5	"Ann",
6	"Ben"
7	...
8	],
9	"is_overflow": true
10	}