- Be welcomed
- Learn some database history
- Understand "Why Mongo?"
- Get Mongo Atlas set up
By the end of this Unit, you should understand:
- Types of databases
- Documents
- Collections
- BSON
- BSON Data Types
By the end of this unit, you should be able to:
- Create a MongoDB Atlas cluster
- Create collections and documents via the command-line interface (CLI)
- Execute basic queries
Welcome to everLive's introductory course on MongoDB. MongoDB is a document-based database technology intended to work seamlessly with web applications.
To succeed in this course, you'll need to have some familiarity with:
- JavaScript
- The JSON data format
- Command-line (Terminal) tools
If you haven't had a chance to get up to speed with JavaScript, that's okay; we won't be using that until the end of the Unit, so you have time to catch up.
By the end of this course, you'll be able to use Mongo to store document and collection-based data for use in real-world applications. You'll understand the benefits (and drawbacks) of document-based databases, and when they're the best choice for your application's needs. From there, you'll be ready to tackle building a data-driven application powered by MongoDB.
Much of our work in this class will take place in a terminal emulator, so you'll want to make sure you have one you like. Here are some recommendations:
macOS
Windows
If you're on Linux, you probably already have a Terminal you like 🙂.
We won't be using a text editor that much, but it's always handy to have one around for reading sample data, etc. We recommend Visual Studio Code, but any text editor you've used for other courses will work just fine here.
Because we're going to interact with Mongo through the command line, we'll need to install the Mongo Shell. These pages provide instructions on installation on multiple platforms. We'll also walk through installation together in class.
Lastly, you're going to need a Mongo Atlas account. Atlas is MongoDB's cloud-hosted version of Mongo. We will go over setup in class, but you can follow the Getting Started documentation to set up a basic cluster on your own.
If you've gone through our SQl databases course, you're familiar with the term relational database. If not, don't worry; we're going to go over it now. It's worth understanding what relational databases are to understand how Mongo isn't one.
Say I have a list of superheroes, including the team they're on:
Hero | Team |
---|---|
Wonder Woman | Justice League |
Captain America | Avengers |
Batman | Justice League |
Spider-Man | Avengers |
Black Widow | Avengers |
Green Lantern | Justice League |
Hopefully that satisfies both Marvel and DC fans. Now let's say I wanted to rename the team Justice League
to JLA
. I could run a query against all my hero records, updating each one's Team
column if necessary. But here's the thing: I shouldn't have to. Instead, what we would do is create a second table, called Teams
, give each one an id, and use it to refer to that team. Then, whenever I reference that team, I can do so by an unchanging ID number, no matter what I make the name. No laborious updates needed.
Heroes
Hero | TeamID |
---|---|
Wonder Woman | 2 |
Captain America | 1 |
Batman | 2 |
Spider-Man | 1 |
Black Widow | 1 |
Green Lantern | 2 |
Teams
ID | Name |
---|---|
1 | Avengers |
2 | JLA |
That's a relation: connecting one set of unique data to another set of unique data. Sounds great, right? It is, but it's not the only way to consider data. One criticism of relational databases comes when we have many sets of data that are deeply interconnected. The queries to join them together can get diffcult to write and computationally expensive to process. And when most of that data is static (e.g. we won't be renaming teams or anything else), other approaches might make sense. This idea led to the rise of NoSQL: a tech movement to design database management systems that did not use the old relational approach. MongoDB is one such technology.
Here's another way to consider our superhero data.
{"heroes": {
"8a45dc": {
"name": "Wonder Woman",
"team": "Justice League"
},
"75bb5": {
"name": "Captain America",
"team": "Avengers"
}
}}
What are we looking at here? This is JSON, or JavaScript Object Notation. Our parent object, heroes
, is referred to as a collection in MongoDB. Each hero is given a unique id, and a corresponding object called a document. Now, these documents are fairly simplistic, but let's add some more attributes to them to understand how this is a fairly intuitive way of considering data. What if we included our heroes' powers?
{"heroes": {
"8a45dc": {
"name": "Wonder Woman",
"team": "Justice League",
"powers": [
"strength",
"flight",
"Lasso of Truth",
"gauntlet blast"
]
},
"75bb5": {
"name": "Captain America",
"team": "Avengers",
"powers": [
"strength",
"that shield thing",
"incorruptible moral center"
]
}
}}
Imagine linking all those tables together. Yeesh. Here, when we ask for
a hero, we get the necessary information with one query. Pretty elegant.
From here, we can imagine a separate collection of, say, villains
. In
a document-based database, one way of relating heroes to villains would be
to embed the villain document in the hero
document, once the
connection is made. Again, this allows us to ask for a single document
with all the information we need.
Not all the information in our document is of the same kind. We'll have
text, numbers, dates, oh my! Here's an important point: MongoDB does not
actually store data in JSON. Why does that matter? In JSON, we have
a very limited
number of data types that we can use to store information. Date
is not
one of them; nor are specifically-defined number types like double
.
Instead, Mongo stores its data in a binary format that is represented like
JSON, called BSON. BSON allows us to use many more data
types in our
documents. These data types have direct analogs in JavaScript, making the
integration between these two technologies quite tight.
If you've followed the directions above or joined us live, you now have MongoDB and the Mongo Shell installed on your computer. Great! That means that you can create collections and documents on your own computer. It also means you have the tools necessary to connect to instances of MongoDB running elsewhere. That's great news, because we'll be running our Mongo instance in the cloud.
Atlas instances are known as clusters. They live on 1 of 3 cloud hosting providers: AWS, Google Cloud, or Microsoft Azure.
We'll be following the getting started documentation from Atlas closely, so have it open in another tab. Make sure you've followed the steps there, including:
- Deploy a Free Tier Cluster
- Whitelist Your Connection IP Address
- Create a MongoDB User
- Insert Data into Your Cluster
From there, we are ready to actually work with data. In the last step, we've populated our cluster with several sample databases we can use to explore how Mongo works. Since we've already installed Mongo Shell (right?), we can move on to connecting to our cluster.
From the Clusters section of the Atlas dashboard, click the Connect button to access the full command-line string to connect to your cluster from the terminal. Copy/paste into a Terminal window on your computer. It'll look something like:
mongo "mongodb+srv://cluster0-chphg.mongodb.net/test" --username <your-user-here>
Once you've authenticated, you'll see a long prompt that looks something like:
MongoDB Enterprise Cluster0-shard-0:PRIMARY>
We're finally ready to look at some data!
The organization of a MongoDB cluster goes:
- Cluster
- Database
- Collection
- Document
- Collection
- Database
We're already in our cluster, but we have multiple databases to choose
from. Let's list them out by typing show dbs
and pressing Enter.
You'll see those sample datasets we've loaded. Let's start simple with the
sample_supplies
database. Select it by typing use sample_supplies
.
We're now set to use the sample_supplies
database, which will contain
a unique set of collections. As the help
command will reveal (at any
time), show collections
will list a database's collections, once you've
selected a database. Doing so reveals a sales
collection. To see what we
have in there, we are going to query the collection for all its records.
This is where our commands start to look like JavaScript:
db.sales.find()
As you'll see, we get TOO MUCH DATA back. That's not at all helpful. So instead, let's just look at a single record to understand what documents in this collection look like.
db.sales.findOne()
This helper method returns only a single document for a query. I feel compelled to mention that this is a shortcut for the following command:
db.sales.find().limit(1).pretty()
The limit()
function modifies the query result from find()
and truncates it after the given number of documents. If you've done some JavaScript, this concept of function chaining should be rather familiar.
pretty()
makes the BSON result of our query human-readable. Scroll up to
the top of this document to begin exploring its contents.
To start, we see a "_id"
key. The value is of type ObjectId
, and it's
a crazy series of letters and numbers. This id is unique to the document,
and every document has one. Why not just a simple integer? For one thing,
to prevent collisions—accidental identical Ids. For another,
non-sequential ids make it harder for nefarious types to guess at ids and
attempt to gain access to data they shouldn't have.
After the "_id"
is a "salesDate"
of type ISODate
. While this is
represented as a string here, this is in fact a Date
object with
associated methods, like getMonth()
.
The next top-level key of our document is items
. The []
surrounding
the value indicate this is an Array
. Each element of the array is
a sub-document, representing a component of the sale.
Beneath this large array, we see other keys for storeLocation
,
customer
, couponsUsed
, and purchaseMethod
.
When we make a new document in the sales
collection, all these keys
should be present. Now, we likely wouldn't manually produce such
a document; instead, our application would connect multiple data
components from a user interface or point-of-sale terminal to create this
document.
In our next Unit, we'll get busy making our own collections and performing more advanced queries.