From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Reviewing basic concepts in the level of measurement

From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Start my 1-month free trial

Reviewing basic concepts in the level of measurement

- [Instructor] Okay, you're about to look at a data set for the very first time. What do you look for? The very first thing on my mind, other than the very obvious, the number of rows and the number of columns, is the level of measurement of each of the variables. Level of measurement's been around for 75 years, but it's still fundamental to everything we do. It determines which descriptive statistics, which statistical tests, which charts, and which machine learning algorithms make sense for the data. It drives almost everything we do during data understanding and then later during the modeling phase. Stevens was a Harvard psychologist, and he wrote on the theory of scales of measurement way back in 1946, but we're still under its influence today. Even had this little table in the paper with the four levels and what statistical tests were appropriate, like median for some, but mean for others. The four terms, nominal, ordinal, interval, and ratio, are still used. We'll define them, but first, for our purposes, we can combine the last two. The only distinction between ratio and interval is having a true zero. The nerdy example that's often used is that Fahrenheit and Celsius temperature scales do not have a true zero and might be cold at zero degrees on either scale, but it's not the complete absence of heat. But Kelvin does meet this criteria, because zero degrees Kelvin is absolute zero. It's a fun fact to know, but not a distinction that we need to worry about. So we'll combine them into one category called either scale or continuous. You'll hear them both. However, the distinction between nominal, ordinal, and scale will be very important to us. Examples of nominal include things like marital status, payment method, whether you own or rent your home, or perhaps your occupation. It's separate and distinct categories that are not meaningfully ranked. Ordinal are separate and distinct categories that are meaningfully ranked, so it includes things like degree status, bachelor, masters, PhD, et cetera, age, when reported in categories, income in categories, and many others. Scale variables, the classic being things like height and weight, are variables where taking an average makes sense, where you're measuring things on a continuum. This could also include age and income, but when reported in years or dollars. It could also include things like miles per gallon or the number of individuals living in a household. I know it seems basic, but level of measurement is critical to the efficient analysis of your variables.

Contents