Join Scott Simpson for an in-depth discussion in this video Explore load and uptime, part of Linux: System Maintenance.
- [Instructor] Understanding how long a computer has been running for, and how hard it's working are important aspects of system maintenance. The uptime command tells us both of these things. This shows me the current time. And from the current time, how long the system has been powered on and booted. In my case, it's been up for 18, well just about 19 hours. Uptime is a sensitive topic for some people. It can be considered a point of pride to have a system that hasn't been restarted in a long time.
But sometimes it can be an indication of likely problems to have a system with hundreds, or thousands of days of uptime. On the one hand, a system with a long uptime is probably pretty stable. That's a good thing. But if a system hasn't been restarted in a very long time it could also indicate that some of the software on the system is out of date. On modern systems, restarting for software updates is pretty much a thing of the past. But some (mumbling) and especially older ones do need restarts to install system level software like kernel updates, now and again.
If you're inheriting a system to manage very long uptime could also be a red flag for problems when it does restart. Maybe that system has been online and untouched because there was a problem getting a service running. And nobody wants to go through that process again if they can help it. Of course, that's not to say you should restart your systems all the time. It's just something to keep in mind if you see a system with a surprisingly long uptime. It's also important to keep an eye out for very short uptimes. If you're looking at a system that has a short uptime it could be that it was just restarted on purpose.
Or it could have been restarted accidentally. Perhaps the machine crashed and rebooted. Or maybe it's a container that just popped into existence. As you explore your systems and environment you'll get a sense of what kind of uptimes are suitable for your machines. Back in the terminal I can see that there's two users connected here, and we'll talk about that later. And over here is the load average. The load average is a quick snapshot of how hard the system is working. And how hard it has been working in the recent past.
The three numbers, left to right, indicate the load on the system over the past one minute, five minutes, and 15 minutes. Load means how much work the processor is able to get done, versus how many processes are competing for its resources. 1.00 means that the CPU is working at a break even rate. It's at 100% utilization. But nothing additional is waiting. It's being used to its capacity but no more. Anything over 1.00 means one or more processes are having to wait to be worked on.
And anything under 1.00 means that this CPU has some time to idle. It's not working at full capacity all the time. If you see a number significantly higher than 1.00 it doesn't necessarily mean your system is having problems. It just means that some process or other will need to wait a little bit to have it's work processed. Systems will often spike up over 1.00 when they're doing their normal tasks. What you want to keep an eye out for is systems with a consistently high load average. If you have more than one processor or more than one processor core, higher numbers may be normal.
1.00 is one core at 100% capacity. So a dual core machine would be at full capacity at 2.00. And an eight core machine would be at full capacity at 8.00, and so on. So if you connect into your 12 core powerhouse machine and see it at a 9.00 load average it's only really at about three quarters of its full processing capacity. If your system is running close to its full capacity all the time, that can be okay as long as it's expected. Maybe it's a processing node, or a database server with many clients.
But if your system is consistently over the 100% line and especially if it's over by a lot it may be worth investigating why. Maybe it's time to add more processors to our virtual machine. Or move some tasks to other machines. Or maybe, it's time to upgrade to a more powerful system.
- Exploring a system
- Exploring load and uptime
- Auditing security access, groups, and users
- Checking memory and process status
- Checking free disk space and disk status
- Interrupting and exploring the GRUB boot loader
- Gaining root access
- Exploring recovery options
- Upgrading software
- Freeing disk space
- Adding a disk
- Setting up a logging server
- Building a summary script