Learn how to create standard line, bar, and pie plots.
- [Instructor] Standard chart graphics are excellent tools for conveying simple data insights in a way that anyone can understand. For example, imagine you're an e-commerce business analysis for a company that just makes some major changes to its website layout. You're putting on site usability to convey findings to your manager, you use a line chart, a bar chart, and a pie chart. Use the line chart to show how the total number of items purchased per day has increased since the change.
You use the bar chart to show that the number of purchases increased for customers in the 18 to 25-year-old category, but decreased for customers in the higher than 44-year-old category. Lastly, use a pie chart to show what categories of products generate the greatest proportion of sales and how the site changes affected that proportion before and after. To summarize, use a line chart to show the changes over time. You used a bar chart to show changes in categorical data, and you used a pie chart to categorical data as proportions as the whole.
Now let's a little closer at these charts. The line chart shows change in value of a variable with respect to an X variable, which is often time. You can use line charts to visually compare the values of several related attributes. In contrast, bar charts represent data attribute values within a particular data category by using bars of different heights. Bar charts represent observation counts within categories. Lastly, we have the pie chart. Pie charts represent data attribute values using a circle and slices that comprise it.
A whole and entire set of categorical data is represented by the completely circle, and proportions of observations that fall into each category are represented by proportionate pie slices. In this course, I'm going to show you two methods for building plots in Python. The first one is the functional method, this method involves building plots by calling the plotting function on a variable, or a set of variables. And the second is object oriented method. With the object-oriented method, you build a plot by first generating a blank figure, and then populating that object with plots and plot elements.
I'm going to show you that in the next section. In the coding demonstration to come, I'm going to show you the most popular data visualization libraries in Python. Those are matplotlib and Seaborn. I just wanted to take a second to point out to you the different style figures you can use in Seaborn. These are very convenient for setting up default styles for your data visualizations and you apply them an entire Jupiter notebook, so when you print out your plots they come out formatted and beautiful, like the ones you're seeing here.
There's four different styles you can choose: white grid, dark, white, and ticks. For this course, I'm going to be using the white grid style. Let me show you how to create some standard data visualizations in Python. The first thing you need to do, especially if you use the Anaconda install that we discussed in the "What You Should Know" video at the beginning of this course, then you're going to need to do a pip install. This is where you install external libraries. So in this case, we're going to be installing Seaborn.
It's not automatically installed when you get Anaconda. So we do it now just by tucking that and I've already installed it so it doesn't need to run. The message is just saying that this requirement's already satisfied, but on your machine if you just setup your Jupiter notebooks, then you would need to install it. Okay, and then we're going to import our basic NumPy and Pandas libraries that we discussed in chapter one. And for this demo, for the data visualization we're going to be using matplotlib, so we'll say import, matplotlib.pyplot and we'll call this PLT, and we also want to import the RC params variable from matplotlib, so we'll say, from matplotlib import rcParams, and lastly, we'll import Seaborn.
And I would like to import Seaborn as SB. Okay, we execute this code and we've got our libraries that we need. When you write percentage, matplotlib and then write inline, this tells matplotlib to print the data visualization within the Python notebook instead of opening it in an external graphical user interface. Matplotlib allows you to make global customizations that are applied to every data visualization produced within a Jupiter notebook.
By making changes to matplotlib's RC params variable, you can make changes to the style sheet that underlies matplotlib, more specifically in this course I decided to almost always set the figure size, in other words figure.fixsize, to a width of five inches high and four inches wide. To do that, I write rcParams, specify fig.figsize, and then I set that equal to five inches wide and four inches high.
The last thing I want to show you here is that throughout this course I consistently set the style of the Seaborn figure style to white grid. It's really easy to do that, you just call the set style function. SBset_style, and then pass in this string that says white grid. And when we execute this, we've got our Jupiter notebook setup for data visualization. Let's start by creating a simple line chart.
First we need to create some array objects for Python to draw. So we'll create an X variable and Y variable, we'll set X equal to a range of numbers between one and nine. And we'll set Y equal to a list with a set of numbers. One, two, three, four is zero. Four, three, two, one. Then we tell Python to generate a line chart from them by calling matplotlib's plot function.
All you have to do is call the function on the variables you want plotted and let matplotlib work it's magic. Let's try it out together. So we'll say plt.plot and then we'll pass in our two variables, and print. And here we have it, a nice handy-dandy little line chart. You'll definitely be using the plot function all the time. Let me show you some other ways to generate line charts. You can generate line charts from Panda's objects and in order for me to showy you how to do that, we need to import some data, so let's just bring in our empty cars dataset that you saw in chapter one.
And I also want to isolate the mpg variable to use as a series for plotting. So we'll say cars and specify MGG. So now we have a dataframe called cars and we have a series called MPG. I run that and then to plot just the MPG variable, you'd say mpg.plot, and here we have the MPG variable plotted out.
If you wanted to plot several variables, you can also do that. So let's make a small subset of the cars dataframe. We'll do that, we'll call it DF and then we'll include cylinders, weight, and MPG as the variables in this small subset. And then let's just plot that. We'll say df.plot and easy-peasy. Now we have a line chart with several variables plotted out.
Creating bar charts is just as easy. We'll just reuse the variables we created earlier and we call plt.bar, the bar function on those variables, and there we go. We've got a bar chart instead of a line chart. If you look back up here, you can see this is a line chart that's plotting those same X and Y variables and this is them in a bar chart. You can also create bar charts from Panda's objects. To do that with an MPG variable, we just write mpg.plot and then we pass in the parameter kind='bar' print that and we get a vertical bar chart.
If you want to create a horizontal bar chart instead, that's really easy. You use the same plot function. So you would write, mpg.plot and then you pass in kind is equal to barh, for horizontal. And there you have it, we got a horizontal bar chart. Let me point out here that I used a double quote, but if I wanted to pass in the same string, I could also use a single quote and it would work in the same way.
You can use double quotes or single quotes. Lastly, let me show you how to create a pie chart. Let's just create a simple list, we'll call it X and we'll pass in the numbers one, two, three, four and 0.5. To generate a pie chart, we just plt.pie and we pass in the X variable. You also need to say plt.show to print it and there you go, you've got a pie chart.
Really easy, yeah? There was one other thing that I wanted to show you before closing this demonstration, and that's how to save your data visualization as image files. So if you want to save this pie chart as a JPEG, all you have to do would be to say plt.savefig and then specify the file name you want. So piechart.jpeg, and plt.show.
What this does is it saves your pie chart as a JPEG in your working directory. And in case you don't know where that is, you can just say %pwd, and Python will print that out. Now that you know how to generate line charts, pie charts, and bar charts in Python, next I'm going to show you how to define plot elements.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup