Join Doug Rose for an in-depth discussion in this video Define a multidisciplinary practice with multiple meanings, part of Learning Data Science: Understanding the Basics.
- So what is a data scientist? That's actually not an easy question to answer. Defining a data scientist is not as clear as other types of science. If you're a political scientist or a climate scientist, then it means you have a degree from an established program. The term data scientist became widely used before data science was an established discipline. Even now, people call themselves data scientists, come from a mishmash of different fields. As a discipline, data science is still sorting itself out. It's a bit like early archaeology, anyone could call themselves an archaeologist, as long as they picked up a shovel and started digging for artifacts.
Now to be an archaeologist, you have to go through an established university and spend years doing research. Like that early archaeology, data science is still more of a practice than a discipline, you're a data scientist if you work with your data, in a scientific way. That means whether or not you choose to call yourself a data scientist is still pretty much up to you. Still, there are certain groups of people who are a better fit than others. If you're a statistician, data analyst, or work in one of the biological sciences, then you can probably argue that you've always been a data scientist, you may just have been more focused on the data, or the science.
Some of the very first people to call themselves data scientists, were actually mathematicians, others came from systems, and information engineering, there are even some that came from business and finance. If you worked with numbers, and knew a little bit about data, then you could easily call yourself a data scientist. Because of this increasing need for data scientists, there'll be a greater movement to create a standardized skillset, that way companies know who they're hiring. For now, that's not the case, in fact, there's still some danger, that data scientists will be seen as anyone who works with data, and has two thumbs.
The best way to think about data science is to focus on the science, and not the data. In this context, we use empiricism. Empiricism is reacting to the data, through experiments and questions. You probably use an empirical approach all the time, but you just might not think of it that way. A data scientist should use this approach every day. An empirical approach uses a combination of knowledge and practice. Let me give you an example of how I use the empirical approach. As a coach and trainer I have to do a fair amount of travelling, that usually means I find myself in different hotels, with different styles of bathrooms.
I'm always amazed by how many different types of faucets and fixtures there are in the world. One thing I always struggle with is how to turn on the different hotel showers. First I start by asking an empirical question, "How do I turn on the shower?" Then I try an experiment, I press one button, and the water fills the tub. If I press another, the dangling shower head springs to life. After the water works I need to control the temperature, I use the different knobs to get it just right. If I twist one knob too far, the water gets too hot, if I twist another, it gets too cold, so I ask questions and reevaluate, until I make the water comfortable.
On the one hand, I could theorize on how to make the water comfortable, I could jump in, flip the dial and hope for the best. The problem is that I'd be just as likely to be frozen, as scalded. Data scientists use this empirical approach all the time. They ask questions of the data, and make small adjustments to see if they can gain insights, they turn the knobs and try to ask more interesting questions. Remember that data science is a discipline in flux, there are many people with different backgrounds, who call themselves data scientists. For now, focus on the scientific method, use an empirical approach to gain insights from your data, try and focus on the science, and not the data.
Until things become more formalized, that will make you a better data scientist.
- What is data science?
- Making connections with relationship databases
- Importing data into warehouses
- Recognizing different data types
- Applying statistical analysis
- Focusing on knowledge