From the course: Azure Serverless Computing

What is stream analytics? - Azure Tutorial

From the course: Azure Serverless Computing

Start my 1-month free trial

What is stream analytics?

- [Instructor] So, now that we've got our data coming into the IoT hub, we need to put it into some place for long term storage and one good place to do it is something like table storage, 'cause it's cheap, it's long term, and if we have a way to get it from the IoT hub directly into there, that would be great. But we're gonna show an Azure stream analytics job as an option for doing that. What are stream analytic jobs? Azure stream analytics is a way of analyzing data in motion. It takes data that is streaming through and we get it from event hubs, IoT hubs, and blob storage. We run some queries against that, so we look at analytically what is in there that we care about, and then we can send it out into outputs like event hubs, blob storage, other data services, and other output syncs. It scales using something called a streaming unit, which is a measure of how much CPU, disk, memory, reads, writes, are required to do it. The typical uses of stream analytics might be to persist data into cold storage, look for anomalies, look for something that is not normal, maybe an alert condition, maybe the temperature's too high. And if we have that, then stream that into a separate output, the other stuff. Can have a way that we can connect the pipelines to all the different services using stream analytics as the intermediary to be able to pull the data from point A and put it into point B. Another common use is to use it as an input for Power BI. Stream analytic jobs are hosted either in Azure, or you can host them in an IoT Edge device. You can provision it in the portal or from an ARM template. And then once it's been provisioned you can configure it in the portal. You set up its inputs and its outputs and then you set up a query and then you can run it. There is options on start time for this, so when I start the job I can either start right now or I can go back in time and I can start processing at a particular point in time. The stream analytic job, by having a consumer group that's reading off of the input is able to move that to whatever point in time it needs to start and then can do the running of the data from that point forward. It has options for how you handle the ordering of events, because sometimes events will come in out of order and so it can detect that and if you have something that gives a timestamp, be able to make sure that they're come out in the right order. There's also error policy, so, if I have issues with the processing of the data, I can also specify an error policy. So, to understand streaming units a little bit better, it's helps to say that it is more of a capacity measurement to process a query. Because if you think about set calculus it's doing things where a simple select won't use a lot of memory, but if I'm doing one with complex word, clauses, and I'm joining things together and I'm doing windowing, that it take more memory, it takes more disk, CPU, to be able to handle getting that analysis out. And so, those will consume more streaming units than a simple query. So, when you're looking at how do I scale it, you specify the number of streaming units and then you get measurements inside the portal that tell you where you're at as to how many streaming units you're actually using. Let's take a look at how we can provision one.

Contents