From the course: Kubernetes: Monitoring with Prometheus

Creating alerts in Prometheus - Kubernetes Tutorial

From the course: Kubernetes: Monitoring with Prometheus

Start my 1-month free trial

Creating alerts in Prometheus

- [Instructor] In addition to looking at and manipulating and working with metrics, one of the other really powerful components of the Prometheus environment is the fact that in addition to just being able to collect and then manipulate metrics, we can actually use those metrics for alerts. So in this environment, especially with the Prometheus Operator engine, as a part of the Kubernetes environment, it's actually very easy to create and add and manipulate alerts because we can create a configuration map which is effectively a file model. If we go ahead and look at this file here, so more 06_01_test_alert.yml file, we can see that we have a config map and it's specifically being mapped, the important part here is of course within kubernetes it's always all about the labels and specifically this label. Now we are going to give a role parameter of prometheus-rulefiles. And that particular role is picked up by the prometheus operator as basically watching configuration maps get created and its looking for that role label and when the prometheus operator sees this role it says, ah this is another rule, that I can actually apply to my prometheus environment. And in prometheus there are a couple of different ways of doing things. We can create aggregate metrics and give them different names but in our case we really wanna create an alert. And we are going to create a really simple one. It turns out, this is effectively an always on alert. So this is just an alert that gets created. Creates a little sample, um output, and it lets us see what the system is doing. All we have to do to actually enable this is to create the config map. Which is as easy as kubectl create -f and then give it our file name, which is, 06_01 and then we hit tab and that completes it test_alert.yml And so we go ahead and create this. Now it does take the system a little while. First it has to discover the rules file then it has to insert it into the prometheus environment and restart it. So what we're gonna do is we're going to pop on over to our prometheus environment and we are going to go to the alerts page. So click on the alerts tab, here, at the top. And then we're going to refresh, until we finally find our alert, which might actually take a moment or two. It turns out, that in the alertmanager, that an alert has actually shown up in advance if it's showing up in the prometheus UI. A this is just a difference of when the user interface is actually accept this. The fact that the alert sees it, the alert manager only gets its information from the prometheus systems. The prometheus service sees it and actually, probably if we pop back here onto the prometheus page and refresh, and now our alert shows up here as well. So, this was just a timing issue. So we can see within the prometheus view that our alert is active and running. Here it's in the state of firing, so it's always on effectively. And our alert manager sees it as well and gives us some information about when it first saw it here. And that's exactly how the alert infrastructure works on top of that we can do things, like say, let's not listen to this one anymore and we can say go ahead and (sigh)(typing) we can give a name and we can say seen it and so you go ahead and create that alert. So this will then say hey, don't tell me about this particular alert anymore, at least not until I unsilence it because, in our case, we don't really need to hear about it anymore. And that gives us a quick view of the system and the alert environment. And sure enough, it's still is showing in most of these places but the alert manager that's silencing really just means that we're not going to trigger a new event. So if we were to send an email, for example, or something of that nature that silence will quit that operation even though the alert is still saying yes I am here and I am alive and I keep triggering because that's what we told it to do.

Contents