Magicians never share their secrets. But we do. Sign up for our Ruby Magic email series and receive deep insights about garbage collection, memory allocation, concurrency and much more.
When using a tool like
top or checking your app’s CPU usage in AppSignal’s host metrics, the metrics are divided into a couple of categories. You might have seen CPU usage being divided into the “system” and “user” categories, for example.
Besides those two main categories, there are some more, and some subcategories. In this article, we’ll look at what each of these mean, so you can use the reported information to find CPU bottlenecks.
On a Linux machine, running
top will print a CPU line that looks like this:
%Cpu(s): 13.2 us, 1.3 sy, 0.0 ni, 85.3 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
These eight states are “user” (
us), “system”, (
sy), “nice” (
ni), “idle” (
id), “iowait” (
wa), “hardware interrupt” (
hi), “software interrupt” (
si), and “steal” (
Of these eight, “system”, “user” and “idle” are the main states the CPU can be in.
The “system” CPU state shows the amount of CPU time used by the kernel. The kernel is responsible for low-level tasks, like interacting with the hardware, memory allocation, communicating between OS processes, running device drivers and managing the file system. Even the CPU scheduler, which determines which process gets access to the CPU, is run by the kernel.
While usually low, the “system” category can spike when a lot of data is being read from or written to disk, for example. If it stays high for longer periods of time, you might have a problem with a device driver.
One level up, the “user” CPU state shows CPU time used by user space processes. These are higher-level processes, like your application, or the database server running on your machine. In short, every CPU time used by anything else than the kernel is marked “user”, even if it wasn’t started from any user account. If a user-space process needs access to the hardware, it needs to ask the kernel, meaning that would count towards “system” state.
Usually, the “user” category uses most of your CPU time. If it stays close to the maximum without leaving much idle time for too long, you might have a problem with your application, or another utility running on the machine.
The “nice” CPU state is a subset of the “user” state and shows the CPU time used by processes that have a positive niceness, meaning a lower priority than other tasks. The
nice utility is used to start a program with a particular priority. The default niceness is 0, but can be set anywhere between -20 for the highest priority to 19 for the lowest. CPU time in the “nice” category marks lower-priority tasks that are run when the CPU has some time left to do extra tasks.
The “idle” CPU state shows the CPU time that’s not actively being used. Internally, idle time is usually calculated by a task with the lowest possible priority (using a positive nice value).
“iowait” is a sub category of the “idle” state. It marks time spent waiting for input or output operations, like reading or writing to disk. When the processor waits for a file to be opened, for example, the time spend will be marked as “iowait”.
Elevated CPU time in the “iowait” category can reveal problems outside of the processor. For example, when an in-memory database needs to flush a lot of data to disk, or when memory is swapped to disk.
Besides “nice” in “user” and “iowait” in idle, there are more subcategories the main CPU states can be divided in. The “hardware interrupt” (
irq) and “software interrupt” (
softirq) categories are time spent servicing interrupts, and the “steal” (
st) subcategory marks time spent waiting for a virtual CPU in a virtual machine.
This concludes our overview of CPU statistics, if you’d like to know more about a specific metric, or one we haven’t discussed yet, please feel free to let us know at @AppSignal. Of course, we’d love to know how you liked this article, or if you have another subject you’d like to know more about.