From the course: Azure Spark Databricks Essential Training

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Understand Spark job execution overhead

Understand Spark job execution overhead

From the course: Azure Spark Databricks Essential Training

Start my 1-month free trial

Understand Spark job execution overhead

- [Tutor] As we take a look at the overhead related to jobs, we need to understand how jobs are defined in Databricks Azure. Spark itself defines jobs as one or more tasks that's run on node in a distributed compute cluster. So it's called a Spark job. So we'll take a look at the overhead related to a Spark job in this section. Databricks defines a job as a scheduled Spark job execution. And we're going to take a look at scheduling Spark jobs as Databricks jobs later in this course. So it also relates to a type of cluster which is called a job cluster that spins up and down on the run of the Databricks job. It is also related to the API for scripting Databricks jobs. So we're going to start by looking at Spark jobs and as a review of Spark job execution. Let's look at the major steps. The first step is to load the data into RDD or higher level objects. So that each of the nodes can perform operations on them. The operations can be typical data operations such as join groupBy or…

Contents