- [Instructor] In this courseI assume you are familiar with relational databasesand SQL queries.You don't need to know advanced SQLand you don't need to knowhow to program in other languages.If you're not familiar with SQLor you'd like a little refresher,I suggest viewing either the SQL Essential Training courseor Learning SQL Programming.
Learn how to use SQL to understand the characteristics of data sets destined for data science and machine learning. The course begins with an introduction to exploratory data analysis and how it differs from hypothesis-driven statistical analysis. Instructor Dan Sullivan explains how SQL queries and statistical calculations, and visualization tools like Excel and R, can help you verify data quality and avoid incorrect assumptions. Next, find out how to perform data-quality checks, reveal and recover missing values, and check business logic. Discover how to use box plots to understand non-normal distribution of data and use histograms to understand the frequency of data values in particular attributes. Dan also explains how to use the chi square test to understand dependencies and measure correlations between attributes. The course concludes with a collection of tips and best practices for exploratory data analysis.
Exploratory data analysis vs. hypothesis-driven statistical analysis
Performing data quality checks
Using box plot to understand the distribution of values
Using histograms to understand the frequency of values
Using chi square to understand the correlation between values