Learn how to use Python for data science. This video covers data visualization, machine learning, and Python in data engineering.
- [Lillian] In this section, we're going to talk about why use Python for data science. So, if you're following along in this course series, then in Part One you remember that we covered why use Python for working with data. Now we're going to dig into the specific benefits of using Python within data science. So, as you may recall, the Python programming language is a high-level, interpreted, open-source coding language that's useful in a wide variety of applications and it's an official language of Google. Now, why use Python for working with data? Python is extremely versatile, so you can use it in data engineering, machine learning, and data visualization. You can also use it for web development, application development, game development, and of course data science, which is the subject of this course. Python is also applicable in professional as well as academic pursuits. In this section, we're going to look at using Python for data visualization, machine learning, and data engineering. In terms of Python for data visualization, let's look at the libraries and the functions. The four main libraries for data visualization are Matplotlib, Seaborn, ggplot, GraphX, and Plotly. We're actually covering all of these libraries in this course. And then, in terms of functions, with respect to Python for data visualization, you can use it for exploratory data analysis, data storytelling, decision-support dashboard design, and then public education like news, media, and data journalism. And in this course series, you're learning how to use Python for exploratory data analysis and for decision-support dashboard design. In terms of using Python for machine learning, there are three main libraries that I wanted to highlight here. Although there are many others that are available to you as well, those main libraries I wanted to cover are scikit-learn, TensorFlow, and PyTorch. So, in this course, you're going to learn how to use scikit-learn. TensorFlow is reserved for deep learning, and we're not really getting into deep learning in this course. Although, you will be learning how to curate a Perceptron. In terms of the functionality you can get by using Python for machine learning, according to different use cases, you can use it for regression, clustering, dimension reduction, association rules, deep learning, instance-based learning, decision trees, Bayesian ensemble methods, and regularization algorithms. So, in this course, we're going to cover regression, clustering, and dimension analysis, along with association rules, instance-based learning, decision trees, Bayesian, and ensemble methods. We are going to touch on deep learning but only in the sense that we're curating the Perceptron, like I just mentioned. Now, let's really quickly look at Python for data engineering. You can use Python in data engineering to build simple map-produced jobs without using Java. So, you could build that natively in Python's scripting. You could write data processing jobs within Spark without having to know the Scala language, you could use Python to program an IoT device like Raspberry Pi, and you could use Python with an application called Airflow in order to build, extract, transform, and load processes. Now, all of these would be considered data engineering type tasks. So you could see how adaptable Python is within the data space. You can use it in analytics, you can use it in data engineering, and you can use it in data science. All of this lends itself well to making Python the most adaptable, widely used language across the data space. Now let's look at where AI fits in this picture.
- Why use Python for data science
- Machine learning 101
- Linear regression
- Logistic regression
- Clustering models: K-means and hierarchal models
- Dimension reduction methods
- Association rules
- Ensembles methods
- Introduction to neural networks
- Decision tree models