Overview of data engineering, why it matters, and how it is different from data analytics and data science.
- [Instructor] Data engineering: what is it? Data engineering is a term that has become quite popular recently, especially in the field of big data. What does it exactly mean? How is it different from data analytics or data science? Let's find out. Data is seemingly the cornerstone for any business. It is today produced by a number of sources such as applications, websites, mobile, and social data.
These sources in general produce raw data. For business decision-makers to use this data, it needs to be transformed, cleansed, and aggregated. Let's call this cleaner form of data knowledge data. Data is converted from its raw form to its knowledge form through data processing systems and applications. The practice of architecting, designing, and implementing these data processing systems is called data engineering.
Data engineering focuses on data, its capture, movement, storage, security, and processing. Data engineering is performed by data engineers. They work on building pipelines, applications, APIs, and systems that produce, process, and consume data to meet business needs. To compare, data engineering converts raw data into knowledge data. Data analytics deals with using this data produced by data engineering to generate insights.
Stepping back for the bigger picture, data science uses data analysis and data engineering to predict the future using data from the past.
- What is data engineering?
- Spark and Kafka for data engineering
- Moving data with Kafka and Kafka Connect
- Kafka integration with Apache Spark
- How Spark works
- Optimizing for lazy evaluation
- Complex accumulators