From the course: Kubernetes: Monitoring with Prometheus

Logging vs. monitoring - Kubernetes Tutorial

From the course: Kubernetes: Monitoring with Prometheus

Start my 1-month free trial

Logging vs. monitoring

- [Lecturer] So before we dive deep into the land of Prometheus monitoring we really wanna at least understand the difference between logging and metering of data. Logging really looks at applications and data about their current state and how are they behaving, what issues are they seeing. Are they throwing errors or warnings, and usually that information also gives sort of a long form version of information. We're trying to provide enough information in a log message so that somebody who comes along later can understand what's going on within that particular application. Metering on the other hand is really more about rate of change of data, sort of the current state, for example, the CPU of a system or the number of transactions that a database is seeing. And while those events may also create log messages, for example, a database might actually log how many times it actually received a message or received a particular type of request if we're looking at that as a particular parameter of our database that we wanna understand, the metering data is really just about those numerically quantitative elements of a system. Also metering data tends to be time-series based. So we're looking at something that happens consistently over time. So for example, every second we might look at something, or every number of seconds we might look at something, verus logging which is potentially also time-based but is really much more about the qualitative information. What's going on, what is the system seeing or not seeing. One of the best ways to actually differentiate between these two though is to actually look at them on a live system, so we'll pop over to a Unix terminal here. And here we're on a Unix machine and we can look at log information for example, so we can look at the /var/log/syslog file as an example. Actually we can cap that, so cat/var/log/syslog and this'll give us a bunch of information about what the system is seeing. And here we see a couple of different applications that are providing their logging message into a single location. This is useful, obviously, for looking at the system as a whole and then we can filter through this if you wanted to look at individual components but we see things like, for example, here the systemd application here in the middle of the screen is seeing that the time has been changed. It's just noting that that's happened a couple of times. Probably a time synchronization process has started to happen. We also see an application called kubelet which is one of our Kubernetes components that says that it can't actually get a hold of the cni config. Well if we're looking at the network we might understand what that is. If we don't, that information may not be meaningful to us. Now I mentioned that metrics data is actually time-series based and here, if we're looking through this file we notice that there are time stamps everywhere here, it says, 14:28:41 for example, and that information may lead us to think that these log files are metrics data, and we could actually use them as metrics data because we could for example, parse through and look for how many times does kubelet have a W0604 error? And the time stamp would be useful in understanding how often that happens from a time perspective, so that's another way of looking at log data to turn it into time series data and look at the metrics. But if we do that we're really stripping away a lot of the qualitative information, oh this is about networks and this was about a specific parameter and turning it into quantitative information. How many times do we see a specific thing happen. Another tool that we can use within the general Unix space to look at metrics is a tool like tops. So we can run the top command here and now what we see is a bunch of information that's updated on a second-by-second basis. So already we're seeing a time series set of data points, each one of these lines here, for example, the top line, the 90100 process Id. That line for example, the kubelet process that is associated with that has a number of parameters. For example, how much CPU percent is it using and every second that gets updated and this is what's generating metrics data for us. So now we can actually see over time if we were to collect this and actually store it somewhere rather than here just share it onto the screen. And we could actually put together a graph to see how does kubelet behave over time. Is is always using 1.7%, or does it occasionally use more, or like we see just right here, it uses 2.7% here all of a sudden, so this is changing over time and in reality the metric that we would be looking at is CPU percentage utilization for the kubelet process. So even though there's information sort of meta data around that metric, the actual data that we're collecting is just one data point. In this case the percentage of CPU. And that's what differentiates metrics from logs. So logs are qualitative, maybe time series-based but really are looking at more information and metrics really are just point-in-time data points that may have some meta data around them but are often even more compact than what we would see in a log environment.

Contents