Learn how to create statistical charts in Plotly.
- [Narrator] Let's look at how to build statistical plots in Plotly. I'm going to show you how to build histograms, boxplots and scatterplots. More specifically, I'm going to show you how to build a simple histogram chart from a series object. Also, I'll show you how to create a multiple histogram chart plotted from a data frame, and subplot histograms. The one thing I want to point out before going into the demonstration is that if you're creating a scatterplot, you always need to set the mode parameter to markers.
That's because by default Plotly will draw lines between data points. So, if you want the points with no lines you need to designate plot mode as markers. I'll show you what I mean in the demonstration. For this demonstration we're going to be using NumPy and Pandas, we'll import those. And since we're going to be plotting off of Panda's objects we need to import cufflinks as cf. We'll also import our standard Plotly tools.
And in this demonstration we're going to use scikit-learn for standard scaler function. So, we'll import that. Import sklearn. And then from sklearn.preprocessing we'll import StandardScaler. Run that, we'll have our libraries, then. Next, we need set our credentials for Plotly so that we can access the Plotly platform.
I showed you how to find those in the last video. The first thing we're going to look at is how to create a histogram and we'll do that originally from a series object. So, we're going to pull in our data from the mtcars data set like we have been throughout this course. So, I'll copy and paste that in. And then we want to isolate our mpg variable. So, we'll say mgp is equal to cars.mpg and this will be a series object.
And then to generate a histogram off of this series we just call mpg.iplot. And then to generate a histogram off of the series object we just call the iplot method off of the series object and pass in the argument kind equals histogram. Let me show you. Mpg.iplot and then say kind equal histogram. And, again, we always need to specify a file name. Here, let's call this simple-histogram-chart and then we can run this.
And we've got a beautiful little histogram. It's interactive and has all the core benefits of a Plotly chart. Now, let me show you how to create a histogram from a data frame. Let's just create a quick subset from our original cars data frame, we'll pull out the mpg variable, displacement and horsepower, and we'll call this subset cars_data. We'll use the special indexer and we'll select columns with the index values one, three and four and access those values.
Next, we'll call scikit-learn's standard scaler function to scale the data, and then the fit transform method off of it to carry out the actual transformation. So, fit_transform, then we pass in our cars data object. And we'll call this cars_data_std. Let's then convert this to a data frame and we'll call that cars_select and we'll call the data frame constructor and pass in our cars_data_std object, and let's name the columns in cars_select.
So, you write the name of our data frame and say .columns and then create a list with the name of our variables. Mpg, displacement and hp. Then, to plot a histogram from this, all we have to do is call the iplot method off of it. So, our object is cars_select and then we call iplot and we pass in kind equal to histogram and we need to pass in a file name, which we'll call multiple-histogram-chart.
Just remember, no spaces in your file names. We run this. I forgot a comma in the first line. This is to specify that we're selecting columns, not rows. And then we'll run the code. It's sort of cool how Plotly automatically makes the histograms transparent and then overlays them on top of each other. And it's definitely cool that it automatically produces these hover over labels in the legend, because that makes it a lot easier to interpret the histogram.
But that said, it's challenging to make sense of these histograms because they're on top of each other. So, I want to show you how to plot them out in separate subplots instead. To do that, we'll reuse this code we just wrote to create this histogram. And all we need to do is pass in an argument that says subplots equal to true and change the file name. We'll call this subplot-histograms and then run it.
Okay, nice, this is a lot easier to interpret. But if you wanted to see the histogram subplots plotted out in a vertical stack instead, that's really simple to do as well. We'll reuse this code and we just add in another parameter. And that shape, we'll say shape is equal to say three rows in one column. We run it, and great, so we have our histograms in three rows in one column. If you wanted to see it plotted out in three columns in one row, you would just change this to one and three.
Cool, huh? Now I'm going to show you how to create a boxplot. We just use the iplot method and we pass in an argument that says kind equal to box. So, let's write the name of our object, that's cars_select and then we call the iplot method. We pass in kind equal to box and then we'll say file name, and give it the file name box-plots. Run this.
Beautiful. So, this is all pretty simple, but making scatterplots is a bit more complicated. Let me show you how. The single most important thing you need to know about creating scatterplots with Plotly is that you always have to set the mode parameter to markers. By default, Plotly will draw lines between data points. So, if you want the points with no lines then you need to make sure to set plot mode as markers. Just like from the examples in the last segment, we need a fig object. It should be a nested dictionary object and it should contain two dictionaries: one for data and one for layout.
In the data dictionary we'd define two sets of x and y variables to be plotted. In both plots the y variable will be the same, displacement. But we will plot mpg along the x-axis in the first data object, and plot hp along the x-axis in the second. This will allow us to compare how mpg and hp relate with respect to displacement. So, first we'll plot the scatter chart of mpg versus displacement. Create a variable named fig and then add curly braces, and for our data dictionary, remember we're going to have two objects.
We're going to have mpg versus displacement and hp versus displacement. Cars_select.mpg and then y, and set that to cars_select.displacement. And here's where we set the mode to markers. We just write mode: and then markers. And let's name this mpg.
Now, we'll create our second data object hp versus displacement, and we'll say x is going to be cars_select.hp and y is going to be cars_select.displacement. We'll set our mode as markers and the name as hp. We also need that layout dictionary. Since we're plotting two different variables along the x-axis, we don't want to assign that axis a name.
We'll leave it empty. But for the y-axis, let's name it standardized displacement. So, we'll say the layout and we'll specify the x-axis title will be title, and then we're just going to leave this blank so we'll pass an empty string. Then for our y-axis, we write y-axis and we're going to pass in title as Standardized Displacement.
After defining the layout settings we're ready to plot the figure. We do that by calling the iplot function and passing in the fig object. We also need to specify a file name. And we'll call it grouped-scatterplot. So, I'll say py.iplot pass in fig, and then say the file name is equal to grouped-scatter-plot. And then run it. Nice, and so, what we can see here is that the cars in the cars data set as hp increases, miles per gallon decreases.
That makes sense. That's all for creating statistical plots in Plotly. Next, I'm going to show you how to create maps.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup