From the course: Cognitive Technologies: The Real Opportunities for Business

Supervised learning

- In this lecture, we're going to talk about supervised learning. And Eric Nyberg is here to help. Supervised learning is like learning by example. An agent is given pairs of information and each pair consists of an input and an example of an appropriate corresponding output. Given enough examples, the agent learns to produce appropriate outputs, even for inputs it hasn't seen before. Let's take a simple example of supervised learning. The history of homes sales in an area could be expressed as a collection of number pairs. The square footage of each home and its sales price. From these pairs of numbers, an agent can learn a function or a model that calculates the expected sales price given the square footage. Now, that's a simple model, one input variable and one output. Square footage is only one factor that influences the price of a home, of course. In New York City where I live, the price of apartments are influenced also by attributes such as what floor an apartment is on, the neighborhood that it's in, whether the building has a doorman or not, and so on. Given a sales history containing those attributes, plus sales prices, you can use machine learning to learn a more complex and realistic model that takes all of these attributes as inputs and predicts a sales price. Now, what makes this supervised learning, is that we used labeled data to train the model. In this example, the labels are prices. Now, machine learning experts talk about how important feature engineering is in developing machine learning models. Eric, can you tell us what this means and why it's so crucial? - Sure, David. If you think about the example you gave before, you mentioned that it might be very difficult to predict sale price of a home just using the square footage because other features, like the neighborhood or recent crime statistics might also have a big influence on whether or not a house sells for a certain price. So we need to keep these things in mind in feature engineering. Which usually involves finding the most discriminating features of the pairs of information that we're using in the training data. - So by discriminating features, you mean the features that are most likely related somehow to the labeled data. - That's correct. - Got it. - Now, supervised learning can perform two main tasks. Each with different applications. Classification and regression. Classification is when the output is one of a set of discreet values, such as the name of the type of animal in a photograph. And regression is when the output is a number, such as price in our real estate example. The applications of supervised learning are many, including sales forecasting, pricing, image recognition, handwriting recognition, text classification, and so on. Now, what are some of the challenges in getting supervised learning to work? - Well, I think you mentioned already this idea of feature engineering but even if we know the right features for building a machine learning system, we also have to work through all the challenges in creating the training data. So if we're going to have a human label, the expected outputs for a set of inputs as training data, we have to assume that that's a fairly easy computation for that human to do and that we're not going to get a lot of disagreement between two or three different humans who would be applying labels to the same data. The other challenge is that to build a sufficiently large data set for training a supervised model might require a lot of time and money because it might require a lot of effort from a lot of humans to get the job done. - And are there application areas that you see of emerging importance where supervised learning could be applied? - I do, I think that there are a lot of domains where there's already a large set of very rich training data that can be used to train classification and regression models. For example, in the domain of personal health, I think that we can look forward to a future where predictive analytics can tell you a lot about what's likely to happen with your health based not only on your own personal health history but by building predictive models over all of the individuals whose records have been used as training data. - Great, so let's wrap this one up. Supervised learning uses labeled training data to learn a model or a function which is able to produce the correct output given an input. Supervised learning can be used for classification tasks and regression tasks. Feature engineering, identifying the characteristics in your input data that are most likely related to the output is crucial. And applications of supervised learning include forecasting, image and handwriting recognition, text classification, and so on.

Contents