From the course: Learning Data Science: Understanding the Basics

Find a correlation

From the course: Learning Data Science: Understanding the Basics

Find a correlation

- We've talked a bit about descriptive statistics in analytics. Another idea to consider is correlation. Many companies use it to guess the types of products you'll buy. It's also used to connect you to friends and acquaintances. If you've used a service like Netflix, you've probably been amazed at how well the website can guess what movies you'll like. Amazon has been using correlation for years to recommend different products. Correlation is a series of statistical relationships that measure the degree that two things are related. It's usually measured in between one and zero. If there's a correlation of one, then the two things are strongly correlated. If there's a correlation of zero, the two things have no relationship. The one can be expressed as a positive or negative number. A negative one is typically an inverse, or anticorrelation. A positive correlation might be something like height and weight. The taller someone is, the more likely they are to weigh more. As height goes up, so does weight. There's even more straightforward examples. The higher the temperature is outside, the more likely that people will buy ice cream. As the temperature goes up, ice cream sales go up. A negative correlation might be something like cars and gasoline. The heavier the car, the fewer miles per gallon you'll probably get. As the weight of the car goes up, the gas mileage goes down. They have an inverse relationship. If you're a runner, then you might notice that you run slower as you go up hill. That's also a negative correlation. The higher the incline, the slower you'll run. As the incline goes up, your speed goes down. Both positive and negative correlations are a great way to see the relationship between two things. An inverse correlation isn't bad. It's just another way of figuring out a relationship. A data science team will look for correlations in their data. They'll try to fine tune any relationship. Fortunately, software tools can handle a lot of the mathematics behind calculating a correlation. One formula they will typically use is the correlation coefficient. You won't typically get a nice, neat round number. Instead, you'll see a .5 correlation or a negative .75. This will show a stronger or weaker correlation. The stronger you are to one or negative one, the stronger the relationship. One interesting data science challenge was LinkIn's People You May Know feature. The company wanted a way to figure out which professionals knew each other. There are data science teams who worked with LinkedIn data and looked for correlations between connections. Then they tried to figure out why they're connected. This can be because of the schools they've attended or the jobs they've shared. It may even be groups and interests they share. The data science team would look for positive and negative correlations. The data might show that you're interested in a job. Someone else is interested in a job, and they worked at the same company. The data science team knows what jobs you look for and knows where you've worked. That's enough to establish a correlation between the two people. There might be a strong, positive correlation between people who worked in the same building and are interested in the same jobs. So the website might recommend that you make a connection. The data science team could also make a correlation between your connections and other people's connections. If you're connected with one person, and they're connected to someone who has similar skills, then you might make a good connection. If you think about it, this makes a lot of sense. You're much more likely to know people who work in the same office building. You're also much more likely to be connected with people who have similar interests. As the number of similar interests increase, then the likelihood of you knowing that person also increases. Correlation also has the power to help your team question its assumptions. You might assume that the people who spend the most on your website will also be your happiest customers. That might not be the case. In fact, there might be a negative correlation between the two. Maybe the people who spend the most actually have the most unrealistic expectations. They're easy to disappoint and more likely to leave negative feedback. As a data science team, you'll be using correlation to test these assumptions. You might look for strategies to get your happiest people to spend more. You might also look for a way to manage your high spenders' expectations. If you look for these correlations, you'll see a lot of things that you might otherwise miss.

Contents