From the course: Python: Working with Predictive Analytics

Differentiate data types - Python Tutorial

From the course: Python: Working with Predictive Analytics

Start my 1-month free trial

Differentiate data types

- [Narrator] In order to make good predictions, you need to understand the types of data we are working with. Looking at our roadmap, we are starting the Data Understanding step. Data can be either numerical or categorical. Numerical data can be expressed as an interval or as a ratio. Categorical data can be broken down as nominal or ordinal. I'll explain each of these with an example. Let's say we own a landscaping business, and we need to gather data on all the different locations we work. In order for our employees to know how much tree removing equipment to take with them, we need to know how many trees are at each location. One home with a small yard may have three trees, and another larger home might have eight trees. This is the numerical data. We also need to know where each place is located. For this data, we need information like street or the city. This is the categorical data. Starting with the first type of categorical data, let's discuss the nominal scale. For our landscaping business data, let's say we have data on what the exterior color is for each house, blue, red, and green. In a nominal scale, we can only compare if the data is equal or not equal. We cannot order, add or subtract, or multiply or divide nominal data, like blue is not larger than red, or green cannot be divided into blue. Categorical data can be on an ordinal scale, and for this, maybe we have the addresses of each house. Door numbers can be 1210, 1211, and 1212, and so forth. In this case, we can decide if the values are the same or not. We can order the houses by their numbers, but we cannot add or subtract, or multiply or divide the ordinal data. Numerical data on the other hand can be expressed as an interval or ratio. Let's start with the interval scale. In our houses, maybe we need to know the temperature readings at each home to know what kinds of plants will work there. Our homes can be 23, 24, and minus three degrees Celsius. We can decide if these values are the same or not, we can order the houses by temperature, and we can add, subtract the temperature. For example, the red house is 27 degrees Celsius warmer than the green one. But, interval scale does not have a true zero point. For example, zero degrees Celsius is still a temperature reading. It is not the absence of temperature. We cannot multiply or divide this data with meaning. Let's look at the other class of numerical data, which is ratio. Ratio is very similar to the interval scale, with the difference that it has a true zero point. This scale is commonly used for values that are measured in numbers, such as length, height, weight, or monetary values like cost and revenue. For example, each of our houses has a square footage, but due to the true zero point property of ratio scale, it doesn't make sense to say a house has minus 400 square feet. We can do all the operations for the ratio data. We can make equal, unequal comparison, we can order them from the largest to the smallest home, we can add and subtract, and multiply and divide this ratio scale. In this chart, we see a summary of the mathematical operations we can perform with each data type. Remember the interval scale example where we had data of temperature readings at each house? Well, yes, we can compare this data, like one house's temperature is not equal to another one, and yes, we can determine if one house is warmer than another house. And yes, we can add or subtract them, and say that one house is 27 degrees warmer than the other house. But, interval scale does not make sense for us to multiply or divide this data, and have data that will help us. When working with prediction models, it's important for you to know that they cannot process categorical data. They need numbers. So, in order to work with predictive analytics, we will need to find ways to convert categorical data into numerical data.

Contents