This video provides an introduction to machine learning.
- [Instructor] It's time to start talking about machine learning. Let me give you an introduction. You can think of machine learning as the brains behind AI technologies, and AI technologies do the actions. More technically, machine learning is the process of applying algorithmic analytical models to preprocessed data in iterations to facilitate the discovery of hidden patterns or trends that are useful for making predictions. As far as what you can do with machine learning, you can do things like sales forecasting, customer segment analysis, insurance claim fraud detection, and hedge fund classification.
There are many different types of machine learning algorithms, and so, I'll leave it to you to just kind of review this list, but more specifically, in this course, we're going to cover regression models, Bayesian modeling, dimension reduction, instance based classification, and clustering. As far as regression methods, you're going to learn how to do linear regression, both simple and multivariate, and logistic regression. And for Bayesian models, you're going to learn Naive Bayes Method. For dimension reduction, you're going to learn how to do principal component analysis, and for instance-based learning, you're going to learn the k-Nearest Neighbor Method.
As far as clustering, you're going to see K-Means Clustering, Hierarchical Clustering, and DBSCAN. Stay tuned because I'm going to teach you the differences between these different types of machine learning methods and how you can do them for yourself, but before that, I need to give you a little background about the vocabulary in machine learning. You're going to hear about features. Those are just the same thing as variable, column, attribute, or field, depending on what type of professional background you have. There are also instances. This is just another word for row, data point, value, or case.
Also, an instance is the same thing as an observation in statistics. You're also going to hear about targets. A target variable is synonymous with the term predictant and dependent variable in statistics. Another term you're going to hear is data, and data refers to the predictor or set of predictor variables you use to make a prediction. In machine learning, you usually break your data into test and training sets. You use random sampling to generate random samples and then break the data into these sets.
A general rule of thumb is that you can put 2/3 of your dataset in the training set, and then, you use it to train your model. The other 1/3 of the data goes into the test set for testing to see how well the model performs. You need to be aware of the difference between supervised and unsupervised methods. Put simply, supervised methods make predictions from labeled data, whereas unsupervised methods make predictions from unlabeled data. Let me give you a simple example. For supervised machine learning, think about spam detection.
You go into your inbox, and you get an email that you know is spam, so you mark it as spam. After doing that several times, your email service provider starts making predictions for you, based on the characteristics of incoming emails and moving them into the spam box. For unsupervised machine learning, an analogy would be, say, you have a house, and you want to put it on the market, but you're not exactly sure what it would sell for. What you could do is take some old historical records, and based on key features like the number of rooms in the house and the square footage and the acreage of the property, you can see what price those houses sold for, and then, based on those key characteristics and the features of your house, you could make a prediction what your house might sell for.
Next, I'm going to show you factor analysis. Technically, factor analysis isn't really machine learning, but it's a great segue into machine learning. So I'm going to give you that introduction, and then, I'm going to follow that up with a similar method called principal component analysis.
- Getting started with Jupyter Notebooks
- Visualizing data: basic charts, time series, and statistical plots
- Preparing for analysis: treating missing values and data transformation
- Data analysis basics: arithmetic, summary statistics, and correlation analysis
- Outlier analysis: univariate, multivariate, and linear projection methods
- Introduction to machine learning
- Basic machine learning methods: linear and logistic regression, Naïve Bayes
- Reducing dataset dimensionality with PCA
- Clustering and classification: k-means, hierarchical, and k-NN
- Simulating a social network with NetworkX
- Creating Plot.ly charts
- Scraping the web with Beautiful Soup