From the course: DevOps for Data Scientists

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Collecting and munging data

Collecting and munging data

From the course: DevOps for Data Scientists

Start my 1-month free trial

Collecting and munging data

- [Instructor] Data science projects begin with collecting and munging data sets. The first stage of data science is identifying data sources and building scripts to collect the data. Data comes from many sources, and typically include databases, which may be relational databases, or NoSQL databases. Now relational databases are commonly used for transaction processing and data warehousing. NoSQL databases support web, and some types of analytic applications. Spreadsheets are often semi-informal sources of data. Spreadsheets are sometimes used to combine small data sets and make specialized calculations. It can be difficult to work with these spreadsheets, because their structures can change frequently. Log files are generated by applications and devices. They tend to be semi-structured, but tools like Microsoft's Log Parser are useful for mapping to more structured formats. External sources can range from third-party data files to APIs that are programmatically queried. With data in…

Contents