Get an overview of the scientific Python ecosystem and why it’s so popular in the data science field. You can mention some of the most popular and useful packages such as NumPy, SciPy, Pandas, matplotlib and others.
- [Instructor] Data science is a hot trend. It enables businesses to base their decisions on data, and not gut feelings. As Edward Deming said, "In God we trust. "All others, bring data." Data science requires a combination of several skills, mainly math and statistics, computer science, and domain knowledge. Note that in the middle, we have unicorn. People who excel at all fields are very rare. The good news for non-unicorns is that with some knowledge from every field, you can still be productive and effective with data.
Python is a very mature and popular language. Version one was released in 1994, before Java, and has seen a great adoption rate in the last four, five years. Python has become a big player in the data science scene, and you can find much more data science-related work in the Python community. From libraries and frameworks, to user meetings, and the number of data-related talks and conventions, I'd say that scientific Python is one of Python's killer apps. The great thing about using Python for data science is that you can use the same language for both research and production.
Data scientists can train sophisticated algorithm and use it in production with ease. Python is also a great general purpose language, and since about 80% of our work as data-scientists involves getting data from various sources and cleaning it, Python is a great fit. Let's have a look at the scientific Python ecosystem. The base to almost everything is NumPy, which is a library of super-efficient matrices. It also includes several utilities for working with these matrices. NumPy leverages highly-optimized C libraries to do its job and is very, very fast.
Adding to what NumPy offers is SciPy. SciPy is a collection of packages that add various math and science capabilities. Random distributions, spark matrices, Fourier transform, linear algebra, and more. And after we're done processing the data, we'd like to display our results. Matplotlib is the library most people use for visualization, although we'll talk about several others later. Matplotlib is very mature, and offers many types of visualization, like line charts, bar charts, scatter plots, and more.
It has many configuration options for colors, size, labels, and more. Most MATLAB users find that the combination of NumPy, SciPy, and matplotlib, cover everything they need when using MATLAB. However, the scientific Python ecosystem offers much more. One of the mostly wide-used libraries is pandas. It offers heterogeneous matrix-like data structures, labeled indices, time series functionality, and much more. Pandas will be the main tool we work with, and by the end of this course, you will be able to slice, dice, and clean data with ease.
Two additional libraries that we'll use are sci-kit learn, which offers many machine learning's algorithms, and Jupyter, which offered enhanced shells and web-based notebooks. We'll spend most of our times with these two. And there are many, many other libraries. We don't have time to cover them all, but if you're doing image processing, or deep-learning, or any other data-related task, be sure there's a good library for it.
Author
Released
7/18/2017- Working with Jupyter notebooks
- Using code cells
- Extensions to the Python language
- Markdown cells
- Editing notebooks
- NumPy basics
- Broadcasting, array operations, and ufuncs
- Pandas
- Conda
- Folium and Geo
- Machine learning with scikit-learn
- Plotting with matplotlib and bokeh
- Branching into Numba, Cython, deep learning, and NLP
Skill Level Intermediate
Duration
Views
Related Courses
-
NumPy Data Science Essential Training
with Charles Kelly3h 54m Intermediate
-
Introduction
-
Welcome46s
-
Mac setup1m 45s
-
Windows setup59s
-
Linux setup55s
-
-
1. Scientific Python Overview
-
2. The Jupyter Notebook
-
Use code cells3m 4s
-
Understand markdown cells3m 23s
-
Edit notebooks4m 10s
-
3. NumPy Basics
-
Overview: NumPy2m 1s
-
NumPy arrays4m 51s
-
Slicing2m 24s
-
Learn Boolean indexing4m 8s
-
Understand broadcasting2m 32s
-
Understand array operations5m 27s
-
Understand ufuncs5m 7s
-
-
4. Pandas
-
Pandas overview1m 58s
-
Load CSV files5m 19s
-
Parse time1m 46s
-
Access rows and columns6m 2s
-
Use pure Python packages2m 19s
-
Calculate speed6m 26s
-
Display a speed box plot2m 41s
-
-
5. Conda
-
Manage environments5m 11s
-
6. Folium and Geo
-
Draw a track on the map4m 51s
-
Use geo data with Shapely6m 10s
-
Generate a report3m 41s
-
7. NY Taxi Data
-
Examine data2m 7s
-
Load data from CSV files2m 44s
-
Work with categorical data2m 50s
-
Work with data: Weather data5m 30s
-
-
8. scikit-learn
-
Introduction: scikit-learn1m 15s
-
Understand train/test splits2m 30s
-
Preprocess data4m 32s
-
Compose pipelines2m 40s
-
Save and load models1m 27s
-
-
9. Plotting
-
Overview: matplotlib1m 5s
-
Use styles3m 1s
-
Customize Pandas output5m 38s
-
Use matplotlib3m 13s
-
Tips and tricks6m 1s
-
Understand bokeh4m 36s
-
-
10. Other Packages
-
Other packages overview1m 19s
-
Understand deep learning7m 52s
-
Understand NLP: NLTK6m 43s
-
Understand NLP: SpaCy2m 51s
-
-
11. Development Process
-
Overview55s
-
Understand source control3m 43s
-
Learn code review4m 55s
-
Testing overview2m 19s
-
Testing example3m 48s
-
-
Conclusion
-
Next steps1m 33s
-
- Mark as unwatched
- Mark all as unwatched
Are you sure you want to mark all the videos in this course as unwatched?
This will not affect your course history, your reports, or your certificates of completion for this course.
CancelTake notes with your new membership!
Type in the entry box, then click Enter to save your note.
1:30Press on any video thumbnail to jump immediately to the timecode shown.
Notes are saved with you account but can also be exported as plain text, MS Word, PDF, Google Doc, or Evernote.
Share this video
Embed this video
Video: Ramp up with Scientific Python