From the course: Data Ingestion with Python
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Where does data come from? - Python Tutorial
From the course: Data Ingestion with Python
Where does data come from?
- [Instructor] As a consultant, companies bring me in to help data scientists do their work. I ask the data scientist what data they require and then go around the organization and figure out how to get it. Organizations usually start small with one or two sources of data, say logs and a database. However, as they grow, they'll have more and more sources of data. Here are some of the sources I've seen throughout the years. Databases are for, well, storing data. There are many kinds of databases from good old relational ones like PostgreSQL, to key-value ones like Redis, to document ones like Elasticsearch and more. As data grows, sometimes it's not even clear where we can find the data we want inside the database. API servers, web servers, batch processes and more, are often write data to log files. These log files can be in many formats, and sadly, you'll probably see several log formats within the same organization. You'll find data in formats like Parquet, or ORC, these files are…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.