Join Michele Vallisneri for an in-depth discussion in this video Doing math with arrays, part of Python: Data Analysis.

- In this video, we're going to look at doing mathematics with number arrays. We will learn how to apply simple mathematical operations to an array and between two arrays. I will also show you how to plot one-dimensional arrays. And in case you need it, I will show you how to do some simple linear-algebra operations with them. Let's go back to the IPython notebook and open the exercise file for video 0403. Again, we need to import numpy, and we'll call it np, and we will also import matplotlib, pyplot, call it pp.

We will also instruct IPython notebook to keep pyplots in line in the notebook itself. Let's start by generating a number array of the numbers between zero and 10. We'll use linspace this let's us specify how many we want. Let's say 40. We can then try to apply a simple trigonometric function to this array. For instance, a sign. For this, we cannot use the standout math.sin function of Python. We need to use the numpy version, which can take a full array as an argument.

These numpy functions are known as universal functions for this reason. So we'll assign the result of applying numpy.sin to x to deviable sin(x). Let's have a look. The best way to see the result of the operation is actually to plot it. So we'll call the mathplotlib function plot and give it the arrays x and sinx as arguments. Here we go. As you can see, the IPython notebook kept this plot in line.

If you save this notebook, the figure will be saved with it. We can also plot a couple of functions together. So let's get the cosine, as well as the sin. I'm copying the plot instruction from the cell above and repeating it with a cosine. Here we go. As a math lab, we can modify the style of the function. For instance, we could give different symbols to the two lines. There are many many options in mathplotlib, and you can look at the documentation if you want to learn more.

Just like we can apply a unary function, like sin to a number array, we can also do arithmetics between them. For instance, let's take the product of the sin and cosine arrays, and let's take a slightly more complicated function. The difference of their squares. And let's again plot them. Both times, we will be using the array x, as the horizontal x's. We can also add a legend to this plot, so that we can tell the two curves apart.

This is done with a mathplotlib function legend. Normally, mathematical operations are applied to arrays element by element. However, if you want to do linear-algebra, that's not the case. For instance, you may want to take the inner product of two vectors. That is the sum of the element-by-element products. You can do this in numpy with the function dot. This will treat the two one-dimensional arrays as vectors. We could also take the outer product, which builds every possible combination of the elements from the two vectors.

The result is a matrix. Numpy always tries to be helpful in any way to guess what you want to do. So it has some broadcasting rules, with which it will try to make sense of operations between arrays of different shapes. For instance, if you have a one-dimensional vector, and you add a number to it, just a single number, the number will be added to every element. Let me fix this simple typo, it's linspace, not space.

Broadcasting also applies if we try to add a one-dimensional array to a two-dimensional array. So let's see what happens if I add a one-dimensional array of size n to a two-dimensional array of size n-by-n. The result is a two-dimensional array where the one-dimensional array has been added to every row. If instead we wanted to add it to every column, we'd first have to turn it into proper n-by-one column vector, by adding a dimension with numpy newaxis.

###### Released

11/12/2015- Writing and running Python in iPython
- Using Python lists and dictionaries
- Creating NumPy arrays
- Indexing and slicing in NumPy
- Downloading and parsing data files into NumPy and Pandas
- Using multilevel series in Pandas
- Aggregating data in Pandas

###### Skill Level **Intermediate**

###### Duration

###### Views

#### Q: The course shows how to download files from FTP and web servers using Python 3.X. How do I do the same thing with Python 2.7?

A: First import urllib, then use urllib.urlretrieve(URL,filename). For instance, to download the stations.txt files used in the chapter 5 video “Downloading and parsing data files,” you’d do urllib.urlretrieve(‘ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt','stations.txt').

**Q. What are the issues with DataFrame.sort()?**
Â

**Q. What are the issues with DataFrame.sort()?**

A: Since Pandas version 0.18, the DataFrame method sort() was removed in favor of sort_values(). Unlike sort(), the new method does not sort records in place unless it is given the option "inplace=True". The following lines of code in the video need changing:Â

- In Chapter 6: Introduction to Pandas/DataFrames in iPandas
- twoyears = twoyears.sort('2015',
ascending=False) -> twoyears = twoyears.sort_values('2015', ascending=False)

- In Chapter 7: Baby names with Pandas/A yearly top ten
- allyears_indexed.loc['M',:,
2008].sort_values('number', ascending=False).head() - pop2008 = allyears_indexed.loc['M',:,
2008].sort_values('number', ascending=False).head() - def topten(sex,year):
- simple = allyears_indexed.loc[sex,:,
year].sort_values('number', ascending=False).reset_index()

- In Chapter 7: Baby names with Pandas/Name Fads
- [in addition to lines above, which are used to initialize the "name fads" computation]
- spiky_common = spiky_common.sort_values(
ascending=False) - spiky_common = spiky_common.sort_values(
ascending=False); spiky_common.head(10)

- In Chapter 7: Baby names with Pandas/Solution
- [in addition to lines above, which are used to initialize the "name fads" computation]
- totals_both = totals_both.sort_values(
ascending=False)

**Q. What are the issues with Pandas categorical data?**
Â

**Q. What are the issues with Pandas categorical data?**

A. Since version 0.6, seaborn.load_dataset converts certain columns to Pandas categorical data (seeÂ http://pandas.pydata.org/

#### Q.Â **What are the issues with matplotlib.pyplot.stackplot?**
Â

A.Â In recent versions of matplotlib, the function matplotlib.pyplot.stackplot now throws an error if given the keyword argument "label". This problem occurs in the "Baby names with Pandas/Name popularity" exercise file, and it can be ignored. In the video, matplotlib does not complain, but nevertheless shows no legend for the plot. The tutorial moves on to show how to make a legend using matplotlib.pyplot.text.

## Share this video

## Embed this video

Video: Doing math with arrays