- Our last task in this project…is to identify name fads.…That is, popular names that appear suddenly…and then fade away quickly.…As we do so, we will see how to…group data with pandas groupby,…how to compute aggregations,…and how to combine Boolean masks.…Let's go to the IPython notebook.…Let's select the 07_05 fads begin exercise file.…We will continue with our work from the last videos.…Let's select cell and run all the cells.…
Let us look at this plot for the popularity…of the top six girl names between 1985 and 1995.…Most of these names were only popular…for a relatively short period.…This prompts the question of how…we can identify a name fad.…A fad will have a certain spikiness to the plot,…more like Britney here than Elizabeth.…What we need to do is to compute a single number…for each name that will tell us how spiky the plot will be.…
However the number should be insensitive to…the total number of appearances for a given name.…After all, a small, not very popular fad is still a fad.…It turns out that the trick to computing the spikiness…
Released
11/12/2015- Writing and running Python in iPython
- Using Python lists and dictionaries
- Creating NumPy arrays
- Indexing and slicing in NumPy
- Downloading and parsing data files into NumPy and Pandas
- Using multilevel series in Pandas
- Aggregating data in Pandas
Skill Level Intermediate
Duration
Views
Q: The course shows how to download files from FTP and web servers using Python 3.X. How do I do the same thing with Python 2.7?
A: First import urllib, then use urllib.urlretrieve(URL,filename). For instance, to download the stations.txt files used in the chapter 5 video “Downloading and parsing data files,” you’d do urllib.urlretrieve(‘ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt','stations.txt').
Q. What are the issues with DataFrame.sort()?
Â
A: Since Pandas version 0.18, the DataFrame method sort() was removed in favor of sort_values(). Unlike sort(), the new method does not sort records in place unless it is given the option "inplace=True". The following lines of code in the video need changing:Â
- In Chapter 6: Introduction to Pandas/DataFrames in iPandas
- twoyears = twoyears.sort('2015',
ascending=False) -> twoyears = twoyears.sort_values('2015', ascending=False)
- In Chapter 7: Baby names with Pandas/A yearly top ten
- allyears_indexed.loc['M',:,
2008].sort_values('number', ascending=False).head() - pop2008 = allyears_indexed.loc['M',:,
2008].sort_values('number', ascending=False).head() - def topten(sex,year):
- simple = allyears_indexed.loc[sex,:,
year].sort_values('number', ascending=False).reset_index()
- In Chapter 7: Baby names with Pandas/Name Fads
- [in addition to lines above, which are used to initialize the "name fads" computation]
- spiky_common = spiky_common.sort_values(
ascending=False) - spiky_common = spiky_common.sort_values(
ascending=False); spiky_common.head(10)
- In Chapter 7: Baby names with Pandas/Solution
- [in addition to lines above, which are used to initialize the "name fads" computation]
- totals_both = totals_both.sort_values(
ascending=False)
Q. What are the issues with Pandas categorical data?
Â
A. Since version 0.6, seaborn.load_dataset converts certain columns to Pandas categorical data (see http://pandas.pydata.org/
Q. What are the issues with matplotlib.pyplot.stackplot? Â
A. In recent versions of matplotlib, the function matplotlib.pyplot.stackplot now throws an error if given the keyword argument "label". This problem occurs in the "Baby names with Pandas/Name popularity" exercise file, and it can be ignored. In the video, matplotlib does not complain, but nevertheless shows no legend for the plot. The tutorial moves on to show how to make a legend using matplotlib.pyplot.text.
Share this video
Embed this video
Video: Name fads