Data analysis is the first step in any data science project that is designed to guide human decision-makers. In this video, learn how common data applications such as spreadsheets, Tableau, and SPSS can help you get the insight needed to guide decisions.
- [Narrator] When people think about data science, machine learning and artificial intelligence, the talk turns almost immediately to tools. Things like programming languages and sophisticated computer setups, but remember, the tools are simply a means to an end, and even then only a part of it. The most important part of any data science project by far is the question itself, and the creativity that comes in exploring that question, and working to find possible answers using the tools that best match your questions. And sometimes, those tools are simple ones. It's good to remember even in data science that we should start with the simple, and not move on to the complicated until it's necessary. And for that reason, I suggest we start with data science applications. And so, you may wonder, "Why apps?" Well number one, they're more common. They're generally more accessible, more people are able to use them. They're often very good for exploring the data, browsing the data. And they can be very good for sharing. Again, because so many people have them and know how to use them. By far the most common application for data work is going to be the humble spreadsheet, and there are a few reasons why this should be the case. Number one, I consider spreadsheets the universal data tool. It's my untested theory that there are more datasets in spreadsheets than in any other format in the world. The rows and columns are very familiar to a very large number of people and they know how to explore the data and access it using those tools. The most common by far is Microsoft Excel and its many versions. Google Sheets is also extremely common, and there are others. The great thing about spreadsheets is they're good for browsing. You sort through the data, you filter the data. It makes it really easy to get a hands-on look at what's going on in there. They're also great for exporting and sharing the data. Any program in the world can read a .csv file, a "comma separated values", which is the generic version of a spreadsheet. Your client will probably give you the data in a spreadsheet, they'll probably want the results back in a spreadsheet. You can do want in-between, but that spreadsheet is going to serve as the common ground. Another very common data tool, even though it's not really an application, but a language, is S-Q-L or SQL, which stands for "Structured Query Language." This is a way of accessing data storing databases, usually relational databases, where you select the data, you specify the criteria you want, you can combine it and reformat in ways that best work. You only need maybe a dozen or so commands in SQL to accomplish the majority of tasks that you need. So a little bit of familiarity with SQL is going to go a very long way. And then there are the dedicated apps for visualization. That includes things like Tableau, both the desktop and the public and server version, and Qlik. What these do is they facilitate data integration, that's one of their great things. They bring in data from lots of different sources and formats, and put it together in a pretty seamless way. Their purpose is interactive data exploration. To click on set groups, to drill down, to expand what you have, and they're very very good at that. And then there are apps for data analysis. So these are applications that are specifically designed for point-and-click data analysis. And I know a lot of data scientists think that coding is always better at everything, but the point-and-click graphical user interface makes things accessible to a very large number of people. And so this includes common programs like SPSS, or JASP, or my personal favorite, jamovi. JASP and jamovi are both free and open source. And what they do is they make the analysis friendly. Again, the more people you can get working with data, the better, and these applications are very good at democratizing data. But whatever you do, just remember to stay focused on your question, and let the tools and the techniques follow your question. Start simple, with the basic applications, and move on only as the question requires it. That way, you can be sure to find the meaning and the value as you uncover it in your data.
- Assess the skills required for a career in data science.
- Evaluate different sources of data, including metrics and APIs.
- Explore data through graphs and statistics.
- Discover how data scientists use programming languages such as R, Python, and SQL.
- Assess the role of mathematics, such as algebra, in data science.
- Assess the role of applied statistics, such as confidence intervals, in data science.
- Assess the role of machine learning, such as artificial neural networks, in data science.
- Define the components of effective data visualization.