From the course: Data Ingestion with Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Where does data come from?

Where does data come from? - Python Tutorial

From the course: Data Ingestion with Python

Start my 1-month free trial

Where does data come from?

- [Instructor] As a consultant, companies bring me in to help data scientists do their work. I ask the data scientist what data they require and then go around the organization and figure out how to get it. Organizations usually start small with one or two sources of data, say logs and a database. However, as they grow, they'll have more and more sources of data. Here are some of the sources I've seen throughout the years. Databases are for, well, storing data. There are many kinds of databases from good old relational ones like PostgreSQL, to key-value ones like Redis, to document ones like Elasticsearch and more. As data grows, sometimes it's not even clear where we can find the data we want inside the database. API servers, web servers, batch processes and more, are often write data to log files. These log files can be in many formats, and sadly, you'll probably see several log formats within the same organization. You'll find data in formats like Parquet, or ORC, these files are…

Contents