From the course: Azure Spark Databricks Essential Training
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Run a production-size job
From the course: Azure Spark Databricks Essential Training
Run a production-size job
- [Instructor] So in this next section, we are scaling our workloads even bigger. And I've started two copies of the notebook with even larger files, so you can see now our cluster is in the resizing state. So if I go into the two workloads, I have both the big workload and then I have the compressed big workload, and you'll remember this is around 800 megs and then this is one-hundredth of that. So if we go in and take a look at this, we can see that this is loaded. So it's the hipster_genomewide_001_1000.vcf. And here, we can see the size, so it's significantly bigger than the other files we've been working with. This is uncompressed. And here, you can see that the Spark jobs are running. And then if we go back to the cluster, we can see that we have a compressed version of this as well. So this is the hipster_genomewide_001_1000.vcf.bz2 and you can see how much smaller that is in size and this is running as well. So this starts to get into, in kind of a small way, the complexity of…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.