Learn about Pandas and the features it have. You can learn how it can handle real world heterogeneous data, it’s effective use of time series data and how we can use it as “Excel on steroids” in Python.
- [Instructor] Pandas is a library for what I call real-world data. Data that is messy, incomplete, and from various types. When Pandas was released, it was adopted by the Python scientific community as the main tool for working with data. Pandas is packed with features, let's look at some of the main ones. The first step in your workflow is usually to input data. Pandas support reading from and writing to various data sources including CSV files, HDF5 files, Excel databases and even the clipboard.
Numpy is great for working with matrices of the same type. Integers, float, et cetera. But data in the real world is often composed of mixed types, also called heterogeneous data. Some can be numeric, age for example. Some can be textual, name for example. Some can be type, year of birth for example, et cetera, et cetera. Pandas supports mixed types with ease. Since Pandas author, Wes McKinney, was working in finance at the time, Pandas also excels with dealing with time-series data.
You can easily select subsets of data by time ranges, and also change the frequency of events to suit your needs. Pandas is efficient at handling large amounts of data. I've worked with tens of millions of rows on a regular PC with ease. We often need the different locator data from different points of view. Pandas makes it easy to select data, group it, reshape it, join it and more. Pandas also includes many statistical functions right out of the box. Such as quantiles, moving averages and more. When it comes to displaying data, Pandas leverages matplotlib and offers many plotting functions on top of it.
It's usually much easier to work with pandasplotlib, than to deal directly with matplotlib. Apart from all this, and much more. Pandas is one of the best documented open-source projects out there. Okay, enough talking. Let's start working with some real world data.
- Working with Jupyter notebooks
- Using code cells
- Extensions to the Python language
- Markdown cells
- Editing notebooks
- NumPy basics
- Broadcasting, array operations, and ufuncs
- Folium and Geo
- Machine learning with scikit-learn
- Plotting with matplotlib and bokeh
- Branching into Numba, Cython, deep learning, and NLP
Skill Level Intermediate
NumPy Data Science Essential Trainingwith Charles Kelly3h 54m Intermediate
1. Scientific Python Overview
2. The Jupyter Notebook
3. NumPy Basics
Manage environments5m 11s
6. Folium and Geo
7. NY Taxi Data
10. Other Packages
11. Development Process
Next steps1m 33s
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.