Join Dan Sullivan for an in-depth discussion in this video Explore data with document databases, part of Advanced NoSQL for Data Science.
- [Instructor] Now let's spend some time working with Python and MongoDB together. What I'm going to do is go to the new button and create a new Python notebook using conda root. Now I'm going to increase the size of the fonts here, just so it's a little easier to read on different platforms. Now the first thing I want to do when I'm working with MongoDB and Python is I want to import the pymongo driver. So I'm going to issue the important command and hit shift return to execute. Now we have the MongoDB driver. Next, I want to create a session or a client object.
Do that, I'm going to create an object or a variable called client. I'm going to use a function from the pymongo driver called Mongo Client and I'm going to specify the database I'm using. If you're in production and working on another server, you can put the other server name or IP address there. And I need to tell it the port that MongoDB is listening on. In our case, it's the default 27017. So I'm just going to hit shift return again to execute. And now, I should have a client object. Now, I want to have access to a database within this client or this server.
So I'm going to create another object called DB, and I'm going to work with the client object I just created, and I'm going to ask it to connect to my database called Test Database. Now if the test database doesn't exist, the Mongo driver will create it for me. So now, let's take a look at this DB object or DB variable. We can see that it's of a class database, and there's some information about the host being on the local host and the port number. And also toward the end, if you'll notice, it shows the name being Test Database. So far, everything looks good. So now I have a connection to the database.
Well on document databases, we typically work with collections. So let's get a collection created. Now again, I have a database connection, and it's connected to a particular database. I can use that variable to tell the driver that I want to connect to something called Test Collection. Now once again, if Test Collection doesn't exist, it'll be created for me. Now let's see how many documents we have in this collection. We can do that by using the collection method called Count. That returns zero, which makes a lot of sense since we haven't inserted any documents yet. So let's create some test data that we can insert.
What I'm going to do is, I'm just going to make up some very simple data here. We're going to use age as one attribute, we'll call that 39. We'll give that a 39 value. We'll have gender as another attribute, and we'll set as F for gender. And finally, we'll have occupation, and the value for that is software engineer. Executing. Now I have a variable called Test Data, and I'm just going to print that out. Age 39, gender F, occupation software engineer. Just what we expect.
Now, if you're familiar with Python, you'll know this is a dictionary. But sometimes it's useful to explicitly ask Python, what type is a particular variable? In this case, if I ask, what is Test Data? I'll see that it's labeled as a dictionary. Now I have some data to insert. So I want to tell my collection that I want to insert one piece of data. And it's called Test Data. Now I could execute this command as is, and that would insert the data. But then the function would return some information, and I want to save that. So I'm going to scroll over, and I'm going to create a variable called Insert Result, and I'm going to save that information.
Then I'll execute. Now I'm going to take a look at Insert Result. Okay, I can see it's an object, of type Mongo, results Insert One Result. Okay, that's what I expected. One of the things I'd like to do is check to ensure that this insert occurred correctly. So I'm going to look at my results, and I'm going to look for the Acknowledge attribute. That's a bullion. Okay, great, it's true. Now at this point, I'd also like to know what the ID is for this object that I just created. To do that, I can type Insert Result, and type Inserted.
Now I don't want to type it all out, so I'm going to hit the tab, and it will tab complete for me. In this case, it gives me some options. I'm going to select Inserted ID, and I'm going to look up the inserted ID for this object. And it returns this long, hexadecimal number, which MongoDB created for me. So it looks like we've successfully inserted an object. We have an ID for it. Let's check the collection and make sure that there's actually something in there. So I'll issue the Collection Count function again. And perfect, we have a one, which is just what we'd expect. Now at this point, we've collected to the MongoDB server, we've created a database, we've created a collection within that database, and we've inserted a record.
So let's just delete the record. And I'm going to once again save the results of this operation. And I'm using the Collection again, because I'm deleting from a collection, and I'm using the command Delete One. Now I want to give it some kind of criteria. Let's say I just want to delete people who are age 39. I can enter a key value pair within the curly brackets, because it's like a document, that's how we specify criteria. And I'll execute. Now that should have deleted things. Let's look at Collection Count one more time. And there it is. It's zero. I just want to point out that in that previous step where I used the Delete One command, that deleted just the first person of age 39.
If I wanted to delete all of them, I would have used the Delete command. Now that we have a handle on basic MongoDB and Python operations, let's look at a more substantial amount of data.
The course begins with an introduction to NoSQL, and then delves into the specifics of document, wide-column, and graph databases. Learn key details for performing data preparation, exploration, and extraction for each type of NoSQL database. Review case studies that show how to use various NoSQL databases with popular data science tools, including the document database MongoDB, the wide-column database Cassandra, and the graph database Neo4j.
- NoSQL compared to traditional relational databases
- Performing common data science tasks
- Preparing data with document databases
- Manipulating data in NoSQL
- Preparing, exploring, extracting, and model building
- Working with document, wide-column, and graph databases
- Reviewing case studies using MongoDB, Cassandra, and Neo4j