Join Jonathan Fernandes for an in-depth discussion in this video Learning algorithms and hyperparameters, part of AWS Machine Learning by Example.
- [Instructor] Let's build on our introduction to machine learning. So the machine learning algorithm's task is to learn the weights for the model. The weights describe the likelihood that the patterns the model is learning reflect actual relationships in the data. A machine learning algorithm consists of a loss function and an optimization technique. The loss is the penalty that is incurred when the predicted value of the machine learning model does not equal the actual value. A loss function quantifies this penalty as a single value.
And the purpose of the optimization technique seeks to minimize this loss. In Amazon Machine Learning, we use three loss functions, one for each type of these prediction problems. The optimization technique used in Amazon Machine Learning is online Stochastic Gradient Descent, or SGD for short. Stochastic Gradient Descent makes sequential passes over the training data. And during each pass, updates feature weights, one example at a time, with the aim of approaching the optimal weights that minimize the loss.
Amazon's Machine Learning uses the following three algorithms. For Binary classification problems, Amazon Machine Learning uses a logistic regression. So that's a logistic loss function and Stochastic Gradient Descent. For multi-class classification, Amazon Machine Learning uses multinomial logistic regression. So that's multinomial logistic loss, and the Stochastic Gradient Descent. And finally, for regression, Amazon Machine Learning uses linear regression, which is the squared loss function and Stochastic Gradient Descent.
Hyperparameters are options that you can choose for a machine learning model. If you want to be more prescriptive about the parameters you want to use for a specific machine learning model, Amazon Machine Learning allows you to do this by selecting Custom when creating your model. Amazon's default option when creating a model have predefined these hyperparameters for you. The learning rate is a constant value used in the Stochastic Gradient Descent algorithm. Learning rate affects the speed at which the algorithm reaches, or converges, to the optimal wait.
The other parameter is model size. Large models have practical implications, such as requiring more RAM to hold the model while training, and when generating predictions. The Stochastic Gradient Descent algorithm makes sequential passes over the training data. The number of passes parameter controls the number of passes that the algorithm makes over the training data. More passes result in a model that fits the data better if the learning rate is not too large. But this benefit decreases when an increasing number of passes.
The Stochastic Gradient Descent algorithm is influenced by the order of the rows in the training data. Shuffling your training data results in better machine learning models because it helps the Stochastic Gradient Descent algorithm, avoids solutions that are optimal for the first type of data it sees, but not for the full range of data. Regularization helps prevent linear models from overfitting training data examples by penalizing extreme weight values. What do we mean by overfitting? This is when our model is good at predicting values similar to the training data, but gives poor results for more generalized data.
In the next video, we will look at the six steps in Amazon's Machine Learning process.
- Learning algorithms and hyperparameters
- Preparing data for AWS
- Using binary, multiclass, and regression techniques
- Creating a datasource
- Generating predictions
- Creating and interpreting batch predictions
- Additional AWS capabilities