Learn how to plot a time series.
- [Instructor] Time series plots convey how an attribute value changes over time. Using statistical methods like autoregressive integrated moving average, you can reliably predict or forecast the demand of a particular retail product based on historical time series data on previous sales of that product. Before forecasting from time series, you first need to know how to handle and plot time series in Python. Working with time series in Python can get really tricky, but pandas makes it simple. Before showing you how to use time series in pandas, let me just show you what a time series looks like.
These four plots all show time series. The first one's a constant time series. Basically, you're not seeing any trends or changes in the variable over time. Trended time series is like this chart over here in the upper right, that's where you see a net increase or decrease in the time series variable over time. In the lower left corner, you'll see an untrended time series. This is an untrended seasonal time series, so the variable is increasing and decreasing according to the seasons of the year, but you're not seeing a net change in the average value of the time series, so we call that untrended.
In contrast, over in the lower right corner, you'll see a trended seasonal time series. This is where the variable increases and decreases with the season, but there's a net gain in the variable over time. Working with time series in Python can get really tricky, like I said. Lucky for us, though, pandas has some functionality for automatically parsing dates. Here I'm going to show you how to use pandas for simple time series plotting. The first thing we need to do, as usual, is input our libraries, so we're going to input numpy in pandas and then we're also going to input our matplotlib and our seaborn.
I'll add these, and run that, import the libraries, and I want to set our parameters for the data visualizations like we've been doing, so I'll do that. Before plotting time series, we need to read in some data that we can plot. Let's use some customer sales data from the local file, we'll use the read_csv function. You can get this file and then download for the course, so you're going to need to specify this address for wherever you've located that file on your computer.
We're going to use the read_csv function, so we'll say pd.read_csv, and then pass in our address. Next we pass in the index column parameter. We want to set our index column equal to our order date variable. This tells Python to use the order date column as row labels for this data frame. We'll say index_col, that's the parameter, and we want to the rows to be indexed according to order date.
The last parameter we need to specify here is parse_dates. What this does is it tells Python to parse the index as a set of dates, so we'll say parse_dates=True, and we'll call this whole thing df for data frame. Let's print out the first few records by calling the head method, and I misspelled address here, so I add an R. You can see this is what our data frame looks like.
There's quite a few columns in this data frame. Since the order date variable was set as a data frame index and parsed, you can easily make a plot of the time series by calling the plot method off of the column you want plotted. Let's look at changes in order quantity over time. We do that by selecting the order quantity column and calling the plot method off of that. Yikes, that's a huge mess. You can't see anything, because there's too many records in the data frame.
But that's OK. Let's just take a random sample of 100 records and plot that instead. Let me show you how. To do that, we'll call the sample method off of our data frame object, and we'll say, n=100, telling Python we want 100 samples, we'll set our seed, random_state=25, and tell Python that we want it to take rows, axis=0.
We'll just call this df2. Now, for our plot, let's add an X and Y label. With a plt.xlabel, we'll add the xlabel of Order Date, just so we know what we're looking at, of course. Then plt.ylabel and set the ylabel equal to Order Quantity. We'll make a title, plt.title, and we'll call it Superstore Sales.
Once your dates are added as a data frame index and parsed, you can call the .plot method off of the column series you want plotted in order to plot a time series. Let's do that. Let's plot out our Order Quantity variable from our df2 data frame. We'll select Order Quantity by the label index here, and then call the .plot method off of that. Oh, much clearer to see. Now we've plotted out a simple time series.
That was easy. Like I said, pandas makes it really, really easy to visualize time series, but if you get into using some of the other libraries in Python, it gets a lot more complicated, so you might want to keep this in mind.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup