From the course: Data Ingestion with Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

The data pipeline (ETL)

The data pipeline (ETL) - Python Tutorial

From the course: Data Ingestion with Python

Start my 1-month free trial

The data pipeline (ETL)

- [Instructor] Most companies have some kind of a data pipeline. This pipeline will take the raw data, most times from server log files, one transformations on it, and edit to one or more databases. This process is also known as ETL, which stands for extract, transform and load. During the pipeline, we handle tasks such as conversion. If you have a text file with the string 2020-01-01, we'd like to convert it to a timestamp or a date time in Python. Validation, check the data for errors, for example, if you have the string 2020-02-30, it's not a validate. Sometimes validation can be more complex. In weather data, we can't have snow in a day with the temperature was about 30 centigrade or 86 Fahrenheit. Enrichment, we'd like to add location information to the user IP. My IP is 216.52.21.11 it's in California, United States. Missing data, what happens if you don't have the customer IP? How can we handle it? You should know how will your data pipeline works, and what is the source for…

Contents