Data science expert Ben Sullins explains the basics of big data and demonstrates how to perform core data engineering tasks including staging, profiling, cleansing, and migrating data.
- Data science is the process of making data useful. It's not something that you can do with just one skillset or another. You need a whole host of skillsets to actually put data to work. And data engineering is one of the most essential skills that you need to really get value from your vast amounts of data. Hi, I'm Ben Sullins and I've been a data geek since the late 90s, focused on helping organizations get the most of their data. In this course, we'll look at all the components of a modern data science ecosystem. I'll start by showing you how all the pieces fit together, then we'll take a look at staging the data in Hadoop, profiling it, cleansing it, so it's ready for our analysts to use.
We'll finish by migrating that data from the back office system into the front office system where our analysts and data scientists will use it for their work. We'll be covering all of these topics to get you up to speed with data engineering and delivering high quality data to your users. So let's dive in.
- Working with systems and schemas
- Managing of a good data pipeline
- Setting up an environment
- Loading and profiling data
- Testing quality
- Adding data types
- Handling missing values and inferred members
- Performing master data lookups
- Loading schemas and tables
- Creating views