Get an overview of the scientific Python ecosystem and why it’s so popular in the data science field. You can mention some of the most popular and useful packages such as NumPy, SciPy, Pandas, matplotlib and others.
- [Instructor] Data science is a hot trend. It enables businesses to base their decisions on data, and not gut feelings. As Edward Deming said, "In God we trust. "All others, bring data." Data science requires a combination of several skills, mainly math and statistics, computer science, and domain knowledge. Note that in the middle, we have unicorn. People who excel at all fields are very rare. The good news for non-unicorns is that with some knowledge from every field, you can still be productive and effective with data.
Python is a very mature and popular language. Version one was released in 1994, before Java, and has seen a great adoption rate in the last four, five years. Python has become a big player in the data science scene, and you can find much more data science-related work in the Python community. From libraries and frameworks, to user meetings, and the number of data-related talks and conventions, I'd say that scientific Python is one of Python's killer apps. The great thing about using Python for data science is that you can use the same language for both research and production.
Data scientists can train sophisticated algorithm and use it in production with ease. Python is also a great general purpose language, and since about 80% of our work as data-scientists involves getting data from various sources and cleaning it, Python is a great fit. Let's have a look at the scientific Python ecosystem. The base to almost everything is NumPy, which is a library of super-efficient matrices. It also includes several utilities for working with these matrices. NumPy leverages highly-optimized C libraries to do its job and is very, very fast.
Adding to what NumPy offers is SciPy. SciPy is a collection of packages that add various math and science capabilities. Random distributions, spark matrices, Fourier transform, linear algebra, and more. And after we're done processing the data, we'd like to display our results. Matplotlib is the library most people use for visualization, although we'll talk about several others later. Matplotlib is very mature, and offers many types of visualization, like line charts, bar charts, scatter plots, and more.
It has many configuration options for colors, size, labels, and more. Most MATLAB users find that the combination of NumPy, SciPy, and matplotlib, cover everything they need when using MATLAB. However, the scientific Python ecosystem offers much more. One of the mostly wide-used libraries is pandas. It offers heterogeneous matrix-like data structures, labeled indices, time series functionality, and much more. Pandas will be the main tool we work with, and by the end of this course, you will be able to slice, dice, and clean data with ease.
Two additional libraries that we'll use are sci-kit learn, which offers many machine learning's algorithms, and Jupyter, which offered enhanced shells and web-based notebooks. We'll spend most of our times with these two. And there are many, many other libraries. We don't have time to cover them all, but if you're doing image processing, or deep-learning, or any other data-related task, be sure there's a good library for it.
- Working with Jupyter notebooks
- Using code cells
- Extensions to the Python language
- Markdown cells
- Editing notebooks
- NumPy basics
- Broadcasting, array operations, and ufuncs
- Folium and Geo
- Machine learning with scikit-learn
- Plotting with matplotlib and bokeh
- Branching into Numba, Cython, deep learning, and NLP
Skill Level Intermediate
NumPy Data Science Essential Trainingwith Charles Kelly3h 54m Intermediate
1. Scientific Python Overview
2. The Jupyter Notebook
3. NumPy Basics
Manage environments5m 11s
6. Folium and Geo
7. NY Taxi Data
10. Other Packages
11. Development Process
Next steps1m 33s
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.Cancel
Take notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.