From the course: SQL for Exploratory Data Analysis Essential Training

Why explore data? - PostgreSQL Tutorial

From the course: SQL for Exploratory Data Analysis Essential Training

Start my 1-month free trial

Why explore data?

- [Instructor] Exploratory data analysis is the process of using queries, statistics and visualization to help us understand important properties of a dataset. Queries help us understand what subsets of a dataset look like. Statistics help us understand global properties of a dataset by using descriptive numeric measures like the mean of an attribute or the maximum and minimum value in a column. Visualizations help us see the global properties of datasets. Now instead of reducing properties to a single number, like the mean, visualizations actually help us see the properties. The reason we want to understand our datasets is that it helps us avoid problems with data analysis. Problems can occur if we make assumptions about properties of data that are not correct. For example, here is a dataset with a single attribute, the average of that attribute is 55. Now the average does not, by itself, give us much information about the shape of the dataset. This dataset also has an average of 55, but it looks significantly different from the previous example. Another benefit of exploratory data analysis is that it can help us identify problems with datasets including missing values, and values that are outside of an expected range.

Contents