From the course: Azure Spark Databricks Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Run a production-size job

Run a production-size job

From the course: Azure Spark Databricks Essential Training

Start my 1-month free trial

Run a production-size job

- [Instructor] So in this next section, we are scaling our workloads even bigger. And I've started two copies of the notebook with even larger files, so you can see now our cluster is in the resizing state. So if I go into the two workloads, I have both the big workload and then I have the compressed big workload, and you'll remember this is around 800 megs and then this is one-hundredth of that. So if we go in and take a look at this, we can see that this is loaded. So it's the hipster_genomewide_001_1000.vcf. And here, we can see the size, so it's significantly bigger than the other files we've been working with. This is uncompressed. And here, you can see that the Spark jobs are running. And then if we go back to the cluster, we can see that we have a compressed version of this as well. So this is the hipster_genomewide_001_1000.vcf.bz2 and you can see how much smaller that is in size and this is running as well. So this starts to get into, in kind of a small way, the complexity of…

Contents