Linux Server Resource Monitoring Made Easy
Learn essential commands and techniques to monitor your Linux server's resource usage and fix performance bottlenecks.
Resource monitoring is a critical aspect of maintaining the health and performance of your Linux servers.
Whether you're managing a personal project or a large-scale production environment, monitoring ensures that your server runs as expected, with minimal downtime and maximum efficiency.
In this guide, we’ll explore essential commands and techniques to help you monitor your server's resource usage and fix performance bottlenecks effectively.
Core Concepts to Understand
Before diving into specific commands and techniques, it’s important to understand the key areas of Linux server resource monitoring.
Memory usage, for instance, should be closely observed to ensure there is enough available RAM. When the server runs out of RAM, it starts using swap space, which is much slower and can significantly degrade performance. While some swap usage is normal, excessive swapping indicates a need for more memory or better optimization.
Storage is another key resource to monitor. Running out of disk space can bring a server to a halt, preventing new data from being written. Common causes include growing log files or unmonitored backups, which need to be managed proactively.
CPU performance also requires attention, as an overloaded CPU can’t keep up with processing demands, leading to high load averages and overall performance issues.
Input/output (I/O) monitoring is equally vital, as excessive read/write operations or traffic flow can create bottlenecks, affecting how efficiently the server handles operations.
By monitoring these core areas, you can identify and fix bottlenecks, enabling you to focus on efficiently restoring and optimizing server performance.
top
Command
top
is one of the most widely used and long-standing commands for monitoring server resource usage in real-time. It provides a dynamic, continuously updated view of system processes and their resource usage.
Unlike static commands like ps
, top
is interactive, allowing users to scroll through the list of processes, filter results, and even terminate processes directly from the interface.
Simply type top
in the terminal, and it will display server statistics in a structured format. You can use the arrow keys to navigate through the list. To exit, simply press q
on your keyboard.
The upper half of the output presents an overview of resource usage, including server uptime, the number of active tasks, CPU usage breakdown, and memory statistics.
The lower half of the output displays a detailed list of processes in different states, sorted by default in descending order of CPU usage.
Time, Uptime, User Sessions, and Load Average
The first line in top
shows the current time, followed by the server's uptime, indicating how long the server has been running since its last reboot.
top - 08:59:53 up 34 min, 2 users, load average: 0.61, 0.16, 0.05
For example, the time is 08:59:53
, and the server has been up for 34 minutes. It also displays the number of active user sessions (in my case, two).
who
command.The load average at the end shows the server's workload over the last 1, 5, and 15 minutes. It's essentially a measure of how many processes are actively using the CPU or waiting for it to become available.
While a load of 1.0
would represent 100% CPU usage on single-core servers, most servers today are multi-core.
For example:
- On a dual-core server, a load of
1.0
means that one core is fully utilized, leaving the other core idle, which equals around 50% CPU usage. - On a quad-core server, a load of
1.0
represents about 25% CPU usage, since only one of the four cores is in use.
It’s important to note that the load average in Linux accounts for both running and waiting processes, not just those currently executing. It represents an average value, not an instantaneous measurement.
To get a rough idea of CPU utilization, divide the load average by the number of CPU cores. While this isn’t an exact measure, it can still be quite useful.
Task Summary
The second line provides information on total tasks, as well as how many are running, sleeping, stopped, or in a zombie state.
Tasks: 105 total, 2 running, 103 sleeping, 0 stopped, 0 zombie
In my case, I have 105 total processes, with 2 running
and 103 sleeping
.
The server tracks the state of each process, which can be one of the following:
- Runnable (R): The process is executing on the CPU or ready to run.
- Interruptible sleep (S): The process is waiting for an event to complete.
- Uninterruptible sleep (D): The process is waiting for an I/O operation to finish.
- Stopped (T): The process is stopped by a signal or is being traced.
- Zombie (Z): A terminated process whose data structures are still in memory because the parent process hasn't collected its status yet.
Processes in the R state are shown as running
, those in the D and S states are shown as sleeping
, and processes in the T state are shown as stopped
.
The zombie
value shows the number of processes in the Z state.
CPU Usage
The third line shows the breakdown of CPU usage, indicating the percentage of CPU used for different tasks.
%Cpu(s): 13.6 us, 0.3 sy, 0.0 ni, 86.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
The us
value represents the CPU usage for running user processes (outside the kernel), while the sy
value represents the CPU usage for running kernel processes.
Linux servers uses a nice value to set the priority of a process. A higher nice value means lower priority, and a lower nice value means higher priority. The ni
value shows the CPU usage for processes with a manually set nice value.
Next is id
, which represents the percentage of CPU that is idle. Then comes wa
, which shows the percentage of CPU spent waiting for I/O to complete (more on this later).
hi
(hardware interrupts) and si
(software interrupts) refer to interrupts, which are signals that request the CPU's immediate attention.
In a virtualized environment, the st
(steal time) shows the percentage of CPU that is unavailable because it is being used by another virtual machine (VM).
Memory Usage
The fourth and fifth lines display the server's memory and swap usage, showing how RAM and swap space are allocated.
MiB Mem : 7941.3 total, 7485.9 free, 383.6 used, 301.6 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 7557.8 avail Mem
The total
value represents the total available memory, while free
indicates the amount of unused memory. The used
value shows the memory currently in use by active processes.
The buff/cache
value refers to memory used by the server for buffers and caching.
Buffers temporarily store data being read from or written to disk, improving disk I/O performance by reducing direct disk access. Cache holds frequently accessed files and directory data in RAM, allowing for faster retrieval without needing to read from disk repeatedly. The server dynamically manages this memory, freeing it when needed for active processes.
The fifth line provides swap space details, showing the total
, free
, and used
swap memory. Swap is disk space used as virtual memory when RAM is full. Ideally, swap usage should be low, as frequent swapping can slow down performance.
The avail Mem
value represents the amount of memory that can be allocated to processes without causing more swapping. It’s the most important value to check when determining how much RAM is available for new processes.
Unlike the free
value, which only shows completely unused memory, avail Mem
also considers memory that can be quickly reclaimed from buff/cache
. Since Linux dynamically manages buffers and cache, this memory is available for new processes when needed.
In short, while free
memory might appear low, a large buff/cache
means that memory can still be used efficiently. That’s why avail Mem
is the best indicator of how much RAM is actually available.
free | grep Mem | awk '{print $7/$2 * 100 }'
Process List
Now, let's dive into the lower half of the top
output, which displays a list of processes in different states, such as running or stopped, along with important details about each one.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
The PID
is the process ID, a unique number that identifies each process. The USER
field displays the username of the user who started the process.
The PR
and NI
fields represent the priority of the process. NI
shows the nice value, which influences the process’s priority, while PR
indicates the process’s priority from the kernel’s perspective.
VIRT
represents the total virtual memory used by a process, including code, data, and swapped-out memory. RES
shows the amount of physical RAM used by the process, giving a more accurate picture of actual memory consumption, excluding swapped-out memory but including shared libraries if they are loaded. SHR
indicates the amount of shared memory used by the process, which can be accessed by other processes (shareable), and is a part of RES
. %MEM
shows the process's memory usage as a percentage of the total available RAM.
In terms of comparison:
VIRT
is always greater than or equal toRES
sinceVIRT
includes potential memory usage, whileRES
shows actual physical usage.RES
includesSHR
, so to calculate the exclusive memory usage of a process, you subtractSHR
fromRES
.%MEM
is simplyRES
expressed as a percentage of total available RAM.
By understanding these metrics, you can better analyze a process’s memory usage. For example, high VIRT
with low RES
might suggest a process with a large address space but efficient memory use. High RES
with high SHR
indicates heavy reliance on shared libraries, and a high %MEM
highlights processes consuming significant memory.
The S
field shows the process's status, indicating whether it's running (R), sleeping (S), stopped (T), or a zombie (Z). Additionally, there is a less common state, the idle kernel thread (I), which represents a process that is not currently doing any work.
%CPU
shows the percentage of CPU usage consumed by the process, while TIME+
indicates the total CPU time the process has been running since it started.
Finally, the COMMAND
field displays the name or command of the process, allowing you to easily identify it.
Usage Examples
So far, I have only talked about the top
output and explained what it provides. However, there's much more to do.
You can manage processes and make changes directly while top
is running, or you can use it with various options to customize the output and make it more suited to your needs.
For example, while top
is running, you can kill a process simply by pressing k, typing the process ID, and pressing the ENTER key. You will then be asked to specify the signal with which to kill the process. If you leave this blank, top
uses the default SIGTERM signal, which allows processes to terminate gracefully. If you want to kill a process forcefully, you can type SIGKILL or use its number (9). The number for SIGTERM is 15.
If you want to filter processes by user, you can use the -u
option followed by a specific username, like so:
top -u <username>
This will show you only the processes started by that user.
You can press the following keys to sort the list of processes top
displays:
- M to sort by memory usage.
- P to sort by CPU usage.
- N to sort by process ID.
- T to sort by the running time.
By default, top
displays all results in descending order. However, you can switch to ascending order by pressing the R key.
You can also sort the list using the -o
option from the command line before running top
, like this:
top -o %CPU
Change %CPU
to %MEM
, TIME+
, or any other attribute you want to sort by.
By default, top
does not show the full path to the process running or make a distinction between kernelspace processes and userspace processes. If you need this information, press the c key. Press c again to go back to the default view. Kernelspace processes are marked with square brackets around them.
You can also display threads, not just processes, by pressing the H key. Notice how the second line changes from saying Tasks
to Threads
, and it will now show the number of threads instead of processes.
Lastly, pressing e switches between kilobytes (default), megabytes, gigabytes, terabytes, and petabytes for process usage. After selecting a size, press W to save your preference, so top
starts with your chosen size next time.
Context Switching and I/O Wait
It's important to understand how a Linux server manages processes and what I/O wait is – specifically, what the wa
value in top
exactly represents.
Processes on a Linux server perform either I/O-bound work or CPU-bound work (like executing arithmetic operations).
I/O-bound work refers to operations where data is read from or written to storage devices. On a server running a WordPress website, common I/O operations include:
- Reading files such as PHP scripts, images, or HTML pages.
- Writing logs like access and error logs.
- Database operations, including reading from or writing to the site's database, especially for online stores.
When a process is performing I/O tasks, it doesn't need the CPU for that part of the task. The CPU is idle while the process waits for the I/O operation to finish.
To make efficient use of the CPU, the server switches to another process that is ready to run, allowing the CPU to be used while the first process waits. This creates the illusion of multitasking, even though the CPU is only running one process at a time. The server switches between processes so quickly that it appears as though multiple processes are running simultaneously. This process is often referred to as context switching.
I/O wait is a measure of the time the CPU spends waiting for I/O operations to complete. While the CPU is capable of processing requests quickly, it may need to retrieve data from the disk. If the disk is slow or busy, the CPU remains idle until the I/O operation finishes. This waiting time is known as I/O wait, which is shown by the wa
value in top
.
In a well-optimized server, I/O wait is typically low – around 0-1% – which indicates that the disk is keeping up with the CPU. However, if you see an wa
value of 25%, that’s a red flag. It means a quarter of the CPU’s time is wasted waiting for I/O operations, making the disk a performance bottleneck.
For example, on a server running WordPress, high wa
value can lead to slower page loads or delays in handling requests because the disk can't retrieve data fast enough. If I/O wait reaches 75%, the impact is even more severe – three-quarters of the CPU’s time is spent waiting on disk operations, which can cripple server performance.
Now, back to the fact that the server switches between processes that are ready to run while others are waiting for I/O operations to finish. Even though the server can switch to another process when one is waiting on I/O, there are scenarios where you still end up with high I/O wait. This is important to clarify, as many people mistakenly think that context switching can eliminate I/O wait time, but that's not entirely true.
If many processes are I/O-bound (such as a busy web server), there might not be enough CPU-bound processes ready to run. In this case, even though the CPU is free, all available processes are blocked, waiting for I/O to finish. The CPU ends up idle during these periods, causing the wa
value to increase.
Another point is that the actual time required for an I/O operation to finish doesn’t change because of context switching. While the CPU might switch to another process, the I/O operation itself still takes a certain amount of time to complete. If this latency is high (because your disk is slow), more processes will be waiting for the I/O operation to finish, and the I/O wait will increase.
If you notice a high wa
value, you should immediately investigate what's causing the issue and fix it. Identifying the bottleneck (whether it's a slow disk, high disk usage, or inefficient processes) will help restore server performance.
iostat
Command
iostat
is a powerful command that provides detailed insights into your server’s I/O performance. It offers a broader view of server-wide I/O statistics. It helps you understand the overall I/O behavior of your server.
The iostat
command is part of the sysstat
package, which is installed on most Linux distributions.
Running the command without any options shows also CPU information, such as the number of cores, average CPU usage, and average I/O wait. Using the -d
option will display only the I/O statistics for the available disks.
I always use the command with the -d
and -x
options, which shows extended I/O statistics for more detailed information.
You can also specify a number after the options to set the interval in seconds at which the command will be rerun. For example, running iostat -d -x 1
will execute the command every second and show updated I/O statistics.
When investigating an I/O problem, such as high I/O wait, I always pay attention to a few key metrics in the command's output.
The first important metric I look at is %util
, which shows how busy the disk is. If %util
is close to 100%, it means the disk is nearly full and handling as many I/O requests as possible, which can cause performance issues. If you see %util
consistently near 100%, it’s a clear sign that your disk is fully utilized, and you may need to investigate the cause of the high I/O usage or consider performance upgrades.
Next, I look at these metrics, each of which provides a different view of your disk’s performance:
r/s
andw/s
: These show the number of read and write requests per second. They help you understand the intensity of I/O operations (IOPS).rkB/s
andwkB/s
: These measure the amount of data in kilobytes being read from or written to the disk per second. They give you an idea of throughput.r_await
andw_await
: These represent the average time (usually in milliseconds) that read and write requests take to complete.
Now, let’s discuss what you can learn from these metrics:
- If
%util
is high but theawait
values are low, your disk might be very efficient even under heavy load. - Conversely, if
%util
is moderate butr_await
orw_await
are high, it could be that the disk is having trouble completing each operation quickly, suggesting latency problems. - If your disk shows high
rkB/s
0rwkB/s
values, it means large volumes of data are being transferred, which could be normal for a heavy database server but might be a performance concern in other scenarios.
Lastly, to view the metrics in megabytes instead of kilobytes, use the -m
option.
iotop
Command
iotop
is a powerful command for monitoring disk I/O usage in real-time, helping you investigate the causes of high I/O wait, such as inefficient processes that are consuming excessive I/O resources.
Similar to how top
displays a list of processes along with their CPU and memory usage, iotop
provides detailed insights into threads consuming the most disk operations.
Sometimes, iotop
is not installed by default. If that's the case, you can install it using the following command:
sudo apt install iotop
Once installed, running iotop
without any options displays a real-time list of threads along with their I/O usage.
To filter and display only active threads performing disk operations, use the -o
option, like this:
ivan@vm1:~$ sudo iotop -o
Total DISK READ: 1530.64 M/s | Total DISK WRITE: 11.92 K/s
Current DISK READ: 1530.64 M/s | Current DISK WRITE: 154.98 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
9655 be/4 root 1530.64 M/s 0.00 B/s 0.00 % 0.05 % hdparm -Tt /dev/sda
...
Total DISK READ
and Total DISK WRITE
show how much data processes are asking the server to read or write. Current DISK READ
and Current DISK WRITE
reflect the real-time data flow to and from the disk.
These values may differ for two reasons:
- The Linux kernel uses caching to speed up I/O, so data might be temporarily stored in memory instead of being immediately written to or read from the disk.
- The kernel may reorder I/O operations for better efficiency.
Because of this, even if processes request a lot of I/O (shown in the Total
values), the actual I/O happening at any moment (shown in the Current
values) might be different.
The TID
(Thread ID) is a unique identifier for the specific thread within a process that is handling I/O operations. The PRIO
(priority) indicates how much priority the server assigns to that thread for execution. The USER
column shows the user who owns the process.
DISK READ
shows the amount of data a process is reading from the disk in real-time. Similarly, DISK WRITE
displays the amount of data being written to the disk by the process at that moment. These values help you identify which processes are performing the most disk operations.
The IO>
column shows the total I/O activity of a process, including both disk reads/writes and swap operations. The SWAPIN
column shows the percentage of swap space the process is using, which can indicate memory pressure. Monitoring both columns helps you understand if performance issues are caused by heavy disk usage, swapping, or both.
Finally, two options I always use with the command are the -P
option, which lists processes instead of threads along with their PIDs, and the -a
option, which shows the accumulated I/O rate.
The accumulated value reflects the total amount of data a process has read or written since monitoring started, averaged over that entire period, instead of just showing the current moment-by-moment I/O rate.
For example, rather than displaying a spike of 10 MB/s for a few seconds followed by a drop to 2 MB/s, the accumulated value averages all the data over time, giving you a clearer, overall I/O rate.
Slow Disk Causing High I/O Wait
I want to share an experience I had with a server running a WordPress website that had a very slow disk. This was when I self-hosted WordPress on a server running the LEMP stack. The server was performing very slowly, and I needed to figure out the cause.
The first thing I did was run the top
command. I noticed high load average values, even though the id
value was around 80%, indicating that most of the CPU was idle. This left me confused. Why was there high load when the CPU wasn’t being heavily used?
It was a challenge to pinpoint the issue, but then I noticed that the wa
(I/O wait) value was high. I can't remember the exact value, but it was around 8%. Typically, the wa
value should be close to 0%, and a consistent value above 1% often indicates an I/O problem.
The server had many cores, so the top
command only showed average values for the %Cpu(s)
line. I pressed 1 to expand the view and see the wa
value for each individual core. Some of them had I/O wait values around 30% and 50%, which clearly pointed to an I/O issue.
To investigate further, I used the iostat
command to check the I/O statistics and noticed that the %util
value was at 100%, meaning my disk was fully utilized.
Next, I ran the iotop
command to see if any processes were causing the problem, but there was no indication of a specific process to blame. I ran iotop
with the -a
option to get accumulated values over time.
After a while, I observed that the Total DISK WRITE
value was quite low, even though the disk was fully utilized. This told me that the disk itself was slow and needed to be replaced.
From this experience, I also learned an important lesson: never save website cache to disk. Instead, use cache on RAM, which is faster and reduces I/O operations, especially if you have enough RAM. In my case, the FastCGI cache stored on the disk was the major contributor to the issue, as my disk was too slow. This was a key factor in causing the performance problem.
df
Command
The df
command is used to display the available and used disk space on filesystems. It provides an overview of how storage is allocated across the different mounted partitions of your server.
Disk space is shown in 1K blocks by default. If you want to run the df
command in its human-readable format, use the -h
option. This will display the sizes in a more understandable format (KB, MB, GB), making it easier to interpret.
If you don’t include a specific mount point, the command shows information on all currently mounted filesystems, with output like this when using the -h
option:
Filesystem Size Used Avail Use% Mounted on
tmpfs 795M 984K 794M 1% /run
/dev/sda 157G 3.0G 146G 3% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 795M 12K 795M 1% /run/user/0
In the output of the df -h
command, each column represents important information about the server’s mounted filesystems.
The Filesystem
column lists the mounted devices or partitions, like /dev/sda
for physical disks and tmpfs
for temporary memory-based storage.
The Size
column shows the total size of each filesystem, while the Used
column indicates how much space is currently used. The Avail
column shows the remaining available space, and the Use%
column displays the percentage of space used. The Mounted on
column tells you where each filesystem is mounted, like /dev/sda
at the root (/
) or tmpfs
at /run
.
We can use the -i
option to check inode (index node) usage. This will display the number of inodes used and available on each mounted filesystem. You can combine it with the -h
option to make the output more human-readable, like this:
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 993K 646 993K 1% /run
/dev/sda 9.9M 133K 9.8M 2% /
tmpfs 993K 1 993K 1% /dev/shm
tmpfs 993K 3 993K 1% /run/lock
tmpfs 199K 33 199K 1% /run/user/0
The Inodes
column shows the total number of inodes available on each filesystem. IUsed
indicates how many inodes are currently in use, while IFree
displays the remaining available inodes. IUse%
represents the percentage of inodes used.
For example, /dev/sda
has 9.9 million inodes, with 133K used (2%), leaving 9.8 million free. tmpfs
filesystems have significantly fewer inodes in use, as they primarily store temporary files.
tmpfs
Filesystems
The tmpfs
filesystems you see in the df -h
output are created automatically by the Linux server during boot. These are virtual filesystems that reside in RAM and are used for various purposes.
/run
is used for storing runtime data for processes, such as PID files and socket files. It’s similar to/tmp
, but the data in/run
doesn't survive a reboot./dev/shm
is used for Inter-Process Communication (IPC), allowing multiple processes to share data in memory. This is faster than writing to disk./run/lock
is used for lock files, ensuring that certain resources are accessed by only one process at a time.
The size of /dev/shm
is usually set to half of the server’s available RAM. You might think this is a waste, especially if it’s not used. However, the memory for /dev/shm
isn’t pre-allocated, and processes can use it if needed. The size of /dev/shm
represents the maximum memory it can use, not what’s reserved. The server gives priority to active processes, so if other processes use 80% of the RAM, /dev/shm
will only use the remaining 20%.
Sometimes, even though the server gives priority to active processes, a poorly behaving or unexpectedly resource-consuming process might end up filling up /dev/shm
, leaving other processes waiting for memory to become available. In such situations, the server will start using swap memory to compensate for the lack of available RAM.
Once the swap is fully used, the Out-of-Memory (OOM) Killer is invoked to free up memory by terminating one or more processes. It doesn’t prioritize cleaning out /dev/shm
, so important processes may be killed, disrupting services.
We want to avoid triggering the OOM Killer whenever possible. It doesn't happen often, but it's good to keep it in mind.
For this reason, it's a good idea to keep an eye on how much space (RAM) /dev/shm
is using.
du
Command
The du
command provides information about how much space individual files and directories consume. Unlike the df
command, which summarizes entire filesystems, du
is useful for drilling down to find large files or directories consuming excessive space.
Running the command alone will produce this output:
4 ./.config/procps
8 ./.config
4 ./.cache
36 .
The du
command displays the size of each directory and subdirectory, with the numbers representing disk usage in kilobytes. The final line provides the total disk usage for the directory where the command was executed, which in this case is 36 KB.
However, the standard du
command does not specify the unit of measurement, which can make the output more difficult to interpret. Additionally, it does not list the size of individual files inside the directory by default, and only includes their total size in the summary at the final line.
We can use the -h
and -a
options to make the output human-readable and list the sizes of all files, like this:
4.0K ./.config/procps
8.0K ./.config
4.0K ./.bashrc
4.0K ./.profile
4.0K ./.bash_logout
4.0K ./.bash_history
4.0K ./.lesshst
0 ./.cache/motd.legal-displayed
4.0K ./.cache
36K .
In this output, the -h
option makes the sizes readable (KB, MB, GB), and the -a
option lists the size of every file, not just the directories.
If you only want to see the total disk usage of a directory, use the -s
option along with the -h
option, like this:
36K .
A particularly useful flag that I especially love is the --time
flag. It shows the time of the last modification for any file within the directory or subdirectory. I often use this flag in conjunction with the -a
and -h
options.
The last option I want to talk about is the -d
option. This option allows you to specify the depth of directories to display, making it useful when you only want to see the sizes of top-level directories without drilling down into every subdirectory.
For example, using du -d 1 -h
will show the total size of each directory at the first level, providing a clearer overview of disk usage without excessive detail.
free
Command
The free command is one of the most widely used command to display server memory statistics. The information it presents is similar to what the top
command shows about memory usage, but in a more concise format.
Running the free
command with the -h
option provides the output in a human-readable format, making it easier to understand by displaying memory sizes in KB, MB, or GB, like this:
total used free shared buff/cache available
Mem: 7.8Gi 384Mi 7.3Gi 988Ki 301Mi 7.4Gi
Swap: 511Mi 0B 511Mi
The server has a total of 7.8 GB of RAM. Out of that, 384 MB is in use, and 7.3 GB is free. It has 301 MB in cache or buffers that can be freed if needed, and 7.4 GB of memory is still available for use by new processes. The Swap
section shows 511 MB of swap space, which is completely free since no swap has been used.
One important clarification: the shared
value in the free
command output is not the same as the SHR
value in top
. In free
, shared
refers to the total memory used by all tmpfs
filesystems (such as /dev/shm
). In contrast, SHR
in top
represents the amount of potentially shareable memory a process is using, such as shared libraries or memory-mapped files. This does not mean the memory is actively shared – only that it could be if other processes use the same resources.
Remember when I mentioned keeping an eye on/dev/shm
to monitor how much space (RAM) it’s using to prevent the OOM Killer from being invoked? You can use thefree
command and check theshared
value instead of using thedf
command for this.
uptime
Command
The uptime
command displays how long the server has been running since its last reboot. It also shows the current time, the number of logged-in users, and the load average over the past 1, 5, and 15 minutes.
I previously talked about load average, which indicates how much work the CPU is handling and whether the server is under heavy load.
Running the uptime
command alone will produce output similar to this:
12:34:56 up 10 days, 4:22, 2 users, load average: 0.45, 0.30, 0.25
Here, 12:34:56
represents the current time, up 10 days, 4:22
means the server has been running for 10 days and 4 hours 22 minutes since its last reboot, 2 users
shows the number of logged-in users, and the load average
values represent the server load.
The -p
option displays the uptime in a simpler, more human-readable format:
up 10 days, 4 hours, 22 minutes
The -s
option shows the exact date and time when the server was last booted:
2025-01-20 08:12:34
These options provide more flexibility when checking server uptime.
Conclusion and Final Thoughts
In this guide, we've explored various commands to monitor and troubleshoot resource usage on Linux servers, with a focus on CPU, memory, storage, and I/O.
Understanding how your server manages resources and identifying potential bottlenecks is crucial for maintaining good performance.
If you found value in this guide or have any questions or feedback, please don't hesitate to share your thoughts in the discussion section.
Your input is greatly appreciated, and you can also contact me directly if you prefer.
Discussion