From the course: Machine Learning and AI Foundations: Value Estimations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Standard conventions for naming training data

Standard conventions for naming training data

From the course: Machine Learning and AI Foundations: Value Estimations

Start my 1-month free trial

Standard conventions for naming training data

- [Instructor] Our home sales dataset has 19 fields for each house. The first 18 fields describe the house itself. They tell us how large it is, where it's located, and so on. These 18 fields are called features. Features are the values that feed into a prediction model. The last field, the sale_price, is special. This is the value we are trying to predict. When we use supervised learning to solve a problem, we'll always have the same setup. Features that feed into a supervised learning algorithm which returns one or more target values. To make it easy to communicate with other programmers, there's some standard conventions for naming these. The set of features we feed into the algorithm is called X. The value or values on the right that we are trying to predict are called Y. When you read the scikit-learn documentation or you look at any machine learning code, you'll see this naming convention used nearly everywhere. We'll also use it for the rest of the course.

Contents