From the course: DJ Patil: Ask Me Anything

What is a scientific process for data science?

From the course: DJ Patil: Ask Me Anything

What is a scientific process for data science?

(upbeat music) - From the perspective of a data scientist, what is the scientific process? - So there's a funny thing that we did a few years ago where we gave this talk and this was a business setting and literally what we did is we basically went to everybody and we said, "Okay, here's a new idea. It's called the data scientific method." And we walked through it and it starts with here, you got to formulate a hypothesis, then you have to figure out how to test it. You got to gather observations, you got to take all this stuff and turn it and then evaluate the hypothesis. And people thought oh my gosh, that's really remarkable. Wow, why is it that we can call it the data scientific method and it takes off, but if we call it just the scientific method it somehow has to adhere to science. The thing that we almost always fail to do and how data science really helps in this is how do you make it testable? Very often it's already testable because there's enough data is out there, it's a question of what is the question? If you've got all this data, you've already got a set of experiences and other things that are happening and now you just got to figure out, well what was the actual test that was being conducted? Then there's this fundamental tension that's there that is do you start with data, do you start with expertise, how do you kind of approach this. And what I like to do is think about it as a very blended process. When you're trying to first get an understanding of something, sure, you might be able to look around, ask questions, do user research, do all those kinds of things. But you can also do the equivalent of user research just by looking through the data and finding interesting correlations. It's not because you're just looking at the correlations to find some specific answer. It's a process by which you use to expand your thinking. Like why is that correlated? Why might that happen? You know, in the scientific sense, we can come up with these kinds of correlations all the time. - Observation is the third step in that process, when in reality, the way you're describing it is that it's the first step of the process. You make the observation and from the observation, you're able to draw a question that makes sense. - Well I think of it as that process is constantly going and what we're doing is we're inserting ourselves in that feedback cycle. And so the observation, sometimes you can start and it's like, hey, there's actually a bunch of data in here to help you start thinking about an interesting problem. You kind of start trying to go well, what are the real problems out here? And you can talk to people, but then you can also look at data and go like, hey, there's this weird stuff. Like why are there these interesting outliers and should we find something that's in there, is there an interesting question that's in there? If we pull on that, that line of argument, that line of scientific process, do we get to something more interesting overall? (upbeat music)

Contents