Learn how to set standard chart graphics—legend, annotation, and labels.
- [Lecturer] Labels and annotation add a deeper layer of context to a plot, enabling the plot to convey extra meaning to its viewers. In this section I'm going to show you how to label plot features, add a legend to your plot and annotate features on your plot. But before that, let me give you an example where this comes in handy. Data journalists often add a lot of context and annotation to their data visualizations in order to add and augment the story they're telling. For example, imagine you're a data journalist covering a story of tourism in Central Florida.
You'd use a simple line chart to show the number of travelers over time, but if you're telling a story about the success of a grand opening of a new theme park you may want to add some text about how many visitors came to the grand opening. You then also may want to add a pointer that ties that text into the date of the park's grand opening. There are two methods for labeling and annotating. Again, the functional method and the objected-oriented method. I'm going to show you both of those in the demonstration to come. But before that, let me give you some information about the methods we'll use, we'll use a .annotate method and within that we're going to specify the parameters for xy, that's the location we want to annotate, xytext, the location of where we want the text added, and arrowprop, this defines particularities about the arrow we'll use.
I'm going to show you all of this, so don't worry too much. Also, adding legends to data graphics is of course very useful, so we'll use the .legend method and pass in a label parameter for the variables or categories we're labeling, and location for where we want that legend to be added. Let's get to work doing this in Python. We're going to NumPy and Pandas, as usual, so we'll import those libraries, and again we're going to be using matplotlib and seaborn, so we'll bring those in as well.
Let's set our standard settings for the data visualizations in our Jupyter Notebook and we run all of these so that our Notebook is set up for plotting. First I want to show you the functional method for labeling features in the plot. So let's create some objects to plot, and first we'll create an x variable and a y variable, x, again, will be a series of numbers between one and nine and y will be a list of numbers.
And I'll just add these in real quick, one, two, three, four, 0.5, four, three, two, one, and go back through, add the commas and let's just plot these out as a bar chart, so we'll call a bar function and we'll pass in x and y. In order to label the x-axis you just say plt.xlabel and then pass in a string that defines the text you want labeled on the axis, so we'll just call this, your x-axis label.
To label the y-axis, it's very similar, except for the function is plt.ylabel and then we'll call that, your y label. Then we can print this out, and you see now we have our x and y axes labeled. Labeling a pie chart is a teeny bit different so let me show you that, we'll create a new object called z and just add some numbers for the values we want plotted. Now we're going to create some labels for each element in this pie chart.
We'll call these labels a vehicle type and it's going to be a list object, so we'll say veh_type, it's a list, and then we'll just say the first element is going to be a bicycle, the second element a motorbike, and then we'll just add in, the third is a car, next a van, and then a stroller and then we'll call our plt pie function, we'll pass in the object that we want to have plotted and then we'll just add the labels parameter and we'll set the labels equal to veh_type, our list that contains our labels we want and then we'll say plt.show to plot it out.
Here we go, we have a pie chart with some labels added. Okay, so to show you how to create labels using the object-oriented method we're going to use the cars dataset that we've been using throughout this course, so we want to load that data. And then we're going to isolate the mpg variable from that dataset by saying mpg is equal to cars.mpg and then, like we discussed earlier, for object-oriented plotting we need to first create a blank figure object using the figure function and then add some axes to it, so let's say fig.add_axes function and then we're going to pass in the location where we want the axis to be added, and then we're going to plot out our mpg variable, but we want to say, when we plot it out, that we have a series of tick marks.
In order to do that we're going to use the set_xticks method and we'll call that off of our ax object, so set_xticks and then we're going to generate a series of numbers between one and 31, so we'll say range 32. Now, to add custom string labels to these tick marks you call the .set_ticklabels method off of the axis object and then you pass in the list or series object that contains the strings you want to be used as labels on your chart.
If you wanted to label tick marks on the x-axis you could just call the set_xticks label function. Let me show you, we'll say ax.set_xticklabels and we'll access our cars dataset and we'll say that we want the car_names variable to be used as labels in our chart and I'm just going to specify a rotation for the labels, so in order to do that we pass in a rotations argument, and we'll say 60 for 60 degrees and make the font size medium.
The argument for that, the parameter is fontsize, and we'll say fontsize is equal to medium. I also want to add a title to the chart, so in order to do that we use the set_title method and our objects is ax so we say ax.set_title and then we pass in a string that's got the title we want to use for our chart. So we'll say, Miles per Gallon of Cars in mtcars, for our dataset.
Okay, and then also let's just really quickly label the x and y-axis. So, with this approach you use the set_xlabel and set_ylabel method so we'd say ax.set_xlabel, and pass in the string for the name of the x-axis, which is car names, and ax.set_ylabel, and the y label, we'll make that miles/gallon.
And then we print out. And that looks pretty nice, right? This is our mpg variable plotted out with each of the car names, this is a car name label for each record in mtcars' dataset. And you'll notice here that our axes are labeled as well and a title has been added. Now let's look at how to add legends to a plot, we'll return to our pie chart example, so we'll say plt.pie pass in the z object, and you can add a legend by calling the legend function on the list that defines legend entries you want labeled.
To set the position of a legend you pass in a location argument and you specify one of the many options. Options include best, upper-right, upper-left, lower-right, and more, you can see that in the documentation. Let's add a legend to this pie chart. To do that we'll say plt.legend and here we'll just put loc equal to best, that's how you set your location parameter. And plt.show to print it out our pie chart.
Now you can see that the labels we used before for labeling each segment in the pie chart have now been used in the legend. To show you how to add a legend using the object-oriented method I'll reuse the plot we created from the mpg variable of the cars data frame. The first thing we always need to do is add a figure object, so plt.figure and then add our axes, add_axes method, specify the position for the axes and then move on.
Now we're going to plot out our mpg variable, again, but this time let's add a legend. So let's take the same labels that we used from the plot earlier and then I'll just show you how to add a legend to this plot. Copy and paste them down and then to add the legend you just call the legend method off of the ax object. And pass in a loc equals to best argument to tell Python to position the legend in the place that looks best on the chart.
So in this instance it's been positioned in the upper-right corner and you see how simple it is to add a legend. The last thing I want to show you in this demonstration is how to annotate your plot. We're going to add some annotation to call up the max value of the mpg variable. Of course, to do that, we first need to find out what that max value is and then we can work through the rest of the example. So in order to find the max value go call the max method off of mpg variable and we see that the max value is 33.8, and I'm going to copy and paste down the code we used to generate this chart so we don't have to rewrite it all.
For this example I want to leave a little more space in the chart so I that I have some room to add an annotation. To create this space let's increase the max limit of our y-axis to 45. We do that by calling the set_ylim method off of the ax object, so set_ylim and then we'll say zero to 45 for our limits. Now let's look at how to annotate this. Again, the annotation method is .annotate, so we'll call the .annotate method off of our ax object, and the first thing we want to pass in is the string for the label that we want to be used as annotation.
So let's label the Toyota Corolla car. You pass in the xy argument to demark the location you're annotating. So here we'll say xy equal to, and we'll name the position 19, which is the record number of the max value, and 33.9. You use the xy text parameter to specify the location where you want the annotation text to be placed. So let's place our annotation text at x value of 21, y value of 35, we do this by saying, xytext equal to 21, 35, and let's also add an arrowprops argument.
This specifies the properties of the arrow that we're drawing from the text to the point we're annotating. So we'll write arrowprops and create a dictionary, say facecolor equals black and shrink equal to 0.5. And print this out. And you can see now that annotation has been added to our graph to identify the max value as a Toyota Corolla. Now that you know how to add labels and annotation into your chart, I'm going to show you a really simple way to plot time series in Python.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup