In this video learn about Azure Batch, including use cases, why to use it, how it works, and the components. In addition, get an overview of the workflow of the service.
- [Instructor] Azure Batch and HPC go hand in hand. The Batch service oversees the HPC processes in Azure. Azure Batch is used to run large-scale and high-performance computing on virtual machines in Azure. We can schedule the jobs and run them in parallel if we wish. Azure Batch is a platform as a service offering. And it is free, but keep in mind you will pay for those compute resources.
Let's take a look at where Azure Batch sits in our Azure Stack. First, we have our hardware layer, so our Service Fabric. And on top of that, we could have Cloud Services or platform as a service offerings, or we could be running virtual machines in IaaS, infrastructure as a service. And then we have our service or solution that we're trying to push our data to. And previously, what we'd have to do is create VMs, scripts, schedules, et cetera in order to process all of this data and then push it up to that service or solution.
Azure Batch replaces this middle layer, and it takes care of all that processing for you. And the best part is, there's no infrastructure on your part to manage. There are several use cases for Batch processing. And again, because Batch works hand in hand with HPC, the use cases are going to be similar, including image rendering, analysis, and processing, media encoding and transcoding, engineering stress analysis, genetic sequence analysis, and financial risk modeling, just to give you a few examples.
Azure Batch uses intrinsically parallel processing. This works well on workloads that can be broken down into tasks, that can then be processed at the same time on multiple instances, which in turn will then produce our output. Now that we have the basic understanding of how a Batch works, let's go ahead and start taking a look at the components that make that Batch service function. The first component we have is a Batch pool, which is a collection of nodes or virtual machines.
And this pool can have hundreds or thousands or tens of thousands of virtual machines. And you can specify and configure these virtual machines to meet your needs. These pools can then be further configured depending on what your specific requirements are. There are two different service pools that we can use. The first is the Batch Service pool. And this is an Azure-managed Pool. You don't have to do anything with it. And it will support both Cloud Services and virtual machine pools.
It also supports shared key authentication, as well as Azure Active Directory authentication. But there is a limitation. Only virtual machine images from the Marketplace are allowed within this pool. The recently added pool type is a User Subscription pool. And when we select a User Subscription model, the pools are assigned to our Azure subscription itself. And it will only support virtual machine pools, so no Cloud Services here.
But you can use custom images. Only Azure Active Directory authentication is supported, therefore you must also configure an Azure key vault. And low-priority VMs are not supported. And you may be wondering what a low-priority VM is. Well, as of this recording, in September of 2017, low-priority VMs are in preview. And these VMs will leverage the excess capacity in the Azure Datacenter for your virtual machines.
These are not dedicated virtual machines. And because they're not dedicated virtual machines, your virtual machine may be preempted. This model is best suited for flexible or non-production workloads. Next, we have Batch jobs, which are a collection of tasks, which assigns the work to a specific pool, as well as assigns a job priority. We also use Batch jobs for scheduling. I like to think of Batch jobs as the manager.
They coordinate all the bits and pieces. But it's the Batch task who is the worker who gets the job done because a task is a command line that executes a process on the node. And a task can include copying resource files or deploying application packages to those nodes. The task will also automatically detect failed tasks and retry a failed or frozen task as well. Let's go ahead and take a look at a sample workflow.
When we need to run a Batch, we start by uploading the input files and applications that might be required to Azure Storage. Our next step is to create a pool that contains the configured nodes that are going to execute the task. Next, the Batch job is created that will run the nodes in the pool. We can now add the task to the job. If any input files or applications are required, these are then downloaded from Azure Storage, prior to that task being executed on those nodes.
And finally, the output files are uploaded to Azure Storage. If the service determines that a task has failed or is frozen, the task can be retried. To quickly recap, the Azure Batch service replaces all that infrastructure and management that we had to use prior to process our data on our virtual machines.
- Creating compute-intensive applications
- Creating long-running applications
- Implementing messaging systems
- Azure Service Bus relays
- Using Azure Storage queues
- Creating an Azure Event Hub
- Creating Azure WebJobs
- Managing cloud environments with Azure Active Directory Domain Services