Learn about Numpy and SciPy, python libraries for numeric processing and scientific programming.
- [Instructor] When we worked with Pandas, you probably noticed that we imported another library called NumPy. You might be wondering what NumPy is all about. There's another library that we often use with NumPy called SciPy. Vanilla Python originally did not include an array data type, and instead substituted list. Python lists make it easy and cheap to insert, delete, and otherwise manipulate data. However, lists aren't an ideal data structure for numeric processing, especially large arrays that we see in linear algebra and other similar types of computing.
Many of you might be familiar with MATLAB. It's a commercial product that's often used in numeric-intensive applications. And in analytics, it's something of a competitor to Python. It excels at matrix computation, and is good for linear algebra, machine learning, and other tasks heavily focused on matrix manipulation. NumPy helps us to do MATLAB-type processing in Python. Let's see how we can use NumPy. First, we will import NumPy as np, this is standard.
And then we're going to create a NumPy array. Notice that an array requires in a list as the initial array, and not a series of numbers. Something like this won't work, but this is correct. Now, what if we want to do a sequence of numbers? We can use the a range function for this. And you see that we have an array consisting of zero through nine. Now, how would we multiply that range by a scaler? Here you can see we're doing a range of 10, multiplied times pi.
And you can see we start from zero to 3.14, and going on, et cetera. Arrays can be multidimensional. One way to do this is to create a single-dimensional array and then use the shape function. Here, we've created a two by three 2D array using shape. NumPy also supports matrices, as well as simple arrays. A matrix is a 2D array, but a special one, with special operations.
It's actually a subclass. Now, why wouldn't we just use a 2D array, we may ask? Well, actually, you can use a 2D array, and for some, that's preferred. Matrices are almost the same as that, but they have some syntactic sugar for matrix multiply. Here, we're going to make a two by two matrix. Matrix multiplication requires the use of matrices and it uses the star operator. Here you see the results of our matrix multiply, A1 times A2.
If we want to convert an array to a matrix, that's easy to do using the np.mat function. Sometimes we want something called a sparse matrix. A sparse matrix is a matrix in which most of the values are empty. If the matrix is very large, it would be wasteful to store all of the empty values. So what should we do, then? NumPy does not have a sparse array, but the companion package SciPy does.
Let's take a look at this. Here, we're going to define a array of 100,000 elements, and we're going to make it 50% sparse. So we'll have 50% empty values. Let's take a look and see what that gives us. Okay, notice that the resulting is a one by 100,000 sparse matrix with just over 50,000 stored elements. So about half are filled in and half are not. In many applications, we could have matrices that are very sparse.
Now, what if we need to load a NumPy array from a file? Only the most trivially sized arrays would be loaded inline. Here you see we're parsing a CSV file, and then we're appending the rows in our CSV file onto our array. And here you see the results of that. So that's NumPy. It gave Python something it arguably should have had in the first place, which is fast number crunching on large arrays and matrices. So now, what do we do with this number-crunching goodness? Well, that's where SciPy comes in.
SciPy gives Python the basic building blocks for numeric and scientific computing. We just used SciPy for sparse matrices, but there are many other parts of SciPy as well. Let's take a common task for our matrices and arrays, linear algebra. Linear algebra enables us to manipulate vectors and matrices. SciPy and NumPy are able to help us with this easily. One thing that we can do here is to use SciPy to help solve an equation. Here we have a solved matrix as the result, and to check the answer, last statement will be the dot product of the original vector times the solve vectors, and this will confirm that the matrix is solved.
SciPy has a number of other mathematical and scientific functions that we won't have time to get into today. However, SciPy forms the basis of many of the things that we will be looking at in the next discussions, especially scikit-learn.
- Configuring your system
- Setting up labs
- Using pandas, NumPy, and SciPy
- Building a classifier
- Clustering data
- Working with big data and PySpark
- Using MLlib
- Beginning with Spark