The get better series puts a focus on some utils and knowledge areas where people like me tend to learn enough to get going, but just forget to come back later, learn more, and get the most out what you’re given.
When it comes to process management, my usual is to check top for what’s consuming cpu and memory, then have a glance at the load. If i’m looking to see whether something is running or not, just a quick ps|grep does the trick. This only touches the surface as these two utilities provide huge amounts of insight into processes, how they’re running, what section of the code is running, what the program stack is doing, and so on…
I’d like to delve into various aspects of processes, from some simple adjustments in using ps, to some of the less documented fields in /proc/pid/x .
more on ps
nb: I will be skipping over some parameters that most will already be familiar with like custom output formats(o), thread display(H), sorting(k), filtering.
Firstly, i want to cover off something that a lot of people might have glanced over in the man page and not seen. %CPU is not really the amount of percentage of CPU that the process is consuming for the entire machine. It is the quotient of the CPU time used / the CPU time that the process has been given to use. This equates more roughly to the efficiency of the process. 100% means that the process is using all the cpu time its being given. This is why if you add the cpu usage for all processes together you are very unlikely to get 100%. Most processes will only use a short amount of its allocated time slice.
Let’s start with a quick win that some might not have come across already… the awesome Forest mode (f). This output format sorts processes according the their job batch and displays them as children. This is extremely useful to see which commands have been forked off from other commands, and where dependencies lie.
Below: very easy to see just what’s going on:
/usr/sbin/sshd \_ sshd: root@pts/0 \_ -bash \_ ps axwwf -o command
Once you can see dependencies, we’d like to look at things that help us debug a bit more what might be causing something to eat cpu, or hang, etc… For that, we’re going to be looking at the Status (STAT) and Waiting Channel (WCHAN) columns. Status comes in most views, but waiting channel is more limited. Its available in the long view (l) format, or you can use a custom output.
Status is pretty simple. A process can be in one of about 7 basic states which are all documented in the man pages. They are all pretty self explanatory and easy to guess what they mean. In practice, most processes will be in D/R/S. Uninterruptable Sleep (D) means that the executing code is currently in the kernel which usually amounts to I/O access. While in this state the process won’t respond to signals, instead they will just be queued, and when the code returns from the kernal mode, the queue is read and the signals are processed. Interruptible sleep means that the code is running and will process any signals sent such as kill.
The waiting channel is lesser known, but good to know. It provides insight as to what a sleeping process is waiting for. A short code is given which may or may not help depending on what you’re trying to find out. ps shortens this field to 6 characters, and so, if you don’t quite get what you want, you’re going to need to dig a bit more. ps gets this code from /proc/pid/wchan, so you can just cat that for your full code. To take this further involves delving into kernel code to trace exactly what might be causing a stall. Things get serious here, and so i’m going to leave it at that for now. Note that for kernels <2.6, system.map must be installed. I didn't find a lot of info out there on WCHAN, so if you find anything, drop me a message.
A quick note on the TIME column. I was quick to assume in the beginning that this was how long the process had been running. Then I noticed that some processes have 0:00 in their time field. Running in forest mode, I can see that all these processes either have child processes, or are in a sleep state. Reading up, this field is for the amount of time that the process has access to the cpu (the linux scheduler determines how much time each process gets).
TTY is also an interesting field. Simply put, it tells us what terminal the command is running on. Using this info we can see what other processes are being run from that terminal (using ps -T pty_num), try to redirect the output or play with the terminal settings using stty. You can read up more on terminals here.
The scheduler class (CLS) is available by specifying the optional format class. This tells us what scheduler class the process is using. In almost all cases we will see the the class is TS which, according to man ps, is for the conventional time-sharing scheduler algorithm.
Pending Signals is interesting to see whether a process has received one or more signals in it’s queue. When the process returns from kernel space, it will process these signals if they are not in the Ignore Signals field. I won’t go further into this as it also leads into kernel code – not my strong point :)
more from /proc
Proc is central to the kernel and on the side, it gives us a look into whats going on in our system. Part of the proc system is dedicated to processes and information on them. This information can be found in /proc/pid where pid is the actual process id.
/proc/pid/environ is a list of the environment variables that the process is using.
/proc/pid/cwd is a link to the working directory
/proc/pid/root tells us what the process thinks its root directory is (different for cases of chroot’d apps)
/proc/pid/fd is a very interesting and useful section. It provides a list of file descriptors letting us know what files the process has open and is accessing. The lsof util gives us insight into this. The usual suspects are normal files, sockets and pipes. In kernels >2.6.22, there is also a directory called /proc/pid/fdinfo which will give you pos and flags. Pos is the position of the file pointer and flags gives us the flags with which is was opened (write/read/append). The primary benefit of knowing the open files is simply to know what the application is doing, or where it is writing logs, communicating, etc…
That’s all i’ve got on processes at the moment. Hope this helps some people…