From the course: Customer Insights and Consumer Analytics for Organizations: Tools and Analysis

Explore customer behavior with Python

From the course: Customer Insights and Consumer Analytics for Organizations: Tools and Analysis

Start my 1-month free trial

Explore customer behavior with Python

- One of the things you often want to do, is explore your data to help identify patterns of customer behavior. This is exploratory analysis. In this demonstration, I'll show you how to use Python to investigate the data that you're working with. So, I already have a Jupyter notebook running and a new notebook here setup so that we can just jump right in. So the first thing that we can do is go ahead and import a package that, if you use Python a lot, you'll use this package quite a lot as well, it's called pandas. So I'm going to import pandas as pd, and then go ahead and run that. And then I have our exercise files listed on the desktop, so let's just go ahead and move into our data directory. So I'll do cd data, and then we can see how that directory is running there. Might be a little bit different from your machine, but, in essence, if you're using the exercise files, there is a data file provided there. And to see the name of that data I'll just go ahead and type in ls and run that command and you can see that the name of the file is ci dash data dot csv. The way you set the working directory is when you launch Jupyter Notebooks and you run that command from terminal, you'll want to run that from within that exercise file's directory, or wherever you have your files. That's how you'll set that specific directory. So I've already done that when I launched the Jupyter Notebook command. And now what we can do is we can go ahead and connect to our data. So what we do is, basically give it a variable name. I'm going to call it data to assess, and then the command looks like something like pd dot read underscore csv, and then I'll do parens and then single quote and then I'm just going to go ahead and copy the name of that file that we just listed there into those single quotes and go ahead and run that. So now we've accessed our data, we've connected to our data. If we want to get a look at what that data looks like, all we need to do is take the variable name that we just assigned to it, and run that. And then you'll get a readout of that data itself. So we're going to be working with this data throughout the demonstrations in this course. So just real quickly I'll run through this so you have a sense of what we're working with here. So we have some unique identifier for each of these records. We've got a number of purchase counts - so you can think of each one of these as being a customer record. We've got the purchase sum from each of those different records. There's some sort of attribution score that we're working with here, and we also list out the number of times they visited the website, some sort of classification for website behavior, and then an overall experience score for that user as well. We then have a number of different variables that we'll work with, but ultimately your data's going to be unique to you, this sort of data set, we're looking at a number of different characteristics, based on the data along with some remarks that the customer input at some point that has been anonymized there. So let's just keep on moving, but that'll give you a sense on, really what we're working with. The other way you can do this, again, as you're in, you know, really this data exploration mode, is if you just want to get a sense of the names of those columns, you can enter that variable name and then you can do dot columns, and you can read that out. So we can see, here are those different names that we just went over. So if you've got a lot of different names and a lot of different columns, that will just give you a readout, make it a little bit easier to work with. And then if you want to know well, how many records are we dealing with? You can run the length command, which is len, and then again we can just paste in the name of that variable name. And so if we run that, you can see that we've got 400 records that we're working with. So sometimes what you want to do is you really just want to get a sense of what the overall shape of the data is. So, one of the things that you can do is, run the method for describe. So, what I just did is I just pasted in that variable name to describe and then just empty parens, and if we run that we can see this will give us a count of each of those different columns. It'll give us the mean, it'll give us the minimum, it'll really just give us a sense as to the shape of the data that we're working with. So sometimes when you're getting into a new data set, for your customer analytics data, you'll want to just spend some time just running that command and just mulling over what you're working with so you can get a sense of where you might take your analysis. And then sometimes what you want to do too, is get a sense as to how complete is the data that you're working with. So one of the commands that you can do is isnull, and so basically what I just did is pasted in that variable name again, dot isnull is the name of that method, and then if I type in values dot any and then a set of parens, and run that, basically this is telling me hey, I don't have any null data within this overall data set. So it really just gives you a sense of if you've got some data that's going to need some cleaning up or not.

Contents