pidstat Command: Tutorial & Examples

Like top, but for one process and recordable — watch exactly what a program does to CPU, memory, and disk, second by second.

What It Is

pidstat reports resource usage per process, sampled at an interval you choose. If top is the live dashboard of the whole machine and ps is a single still photograph, pidstat is the video of one suspect — point it at a process (or all of them) and it prints a fresh line every second showing CPU, memory faults, disk I/O, or context switches, with timestamps you can paste into a ticket. It's part of the sysstat family alongside mpstat, iostat, and sar.

The reason it earns its own page: top is wonderful for looking but terrible for recording and awkward for one process. When you need to prove "this service used 2 cores steadily for 40 seconds, then its disk writes spiked," pidstat is the tool that produces evidence instead of a flickering screen. It only reads, never changes anything. (It comes from the sysstat package — apt install sysstat if it's missing.)

Your First Look

Run it with an interval to watch every process's CPU use, once a second:

pidstat 1

Linux 6.12.86-amd64 (web01)   06/03/2026   _x86_64_   (8 CPU)

05:03:29 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
05:03:29 PM     0         1    0.00    0.00    0.00    0.00    0.00     3  systemd
05:03:29 PM  1000    251473   12.00    3.50    0.00    1.20   15.50     3  java
05:03:29 PM   999     18820    4.10    0.90    0.00    0.30    5.00     5  postgres

One line per process that used the CPU during that second. The header echoes the kernel, host, and (8 CPU) core count (same as nproc). Most of the time, though, you don't want every process — you want one. Point it at a PID with -p:

pidstat -p 251473 1

…and it prints just that process, every second, until you press Ctrl-C. That focused, timestamped stream is the whole reason pidstat exists.

How I Read It

The trick with pidstat is knowing which mode answers your question, because it has several, and each swaps in a different set of columns. By default you get CPU. The four I actually use:

For CPU (default, or -u): I watch the %CPU column to see how many cores' worth a process is eating — remember it's per-core-summed, so 350 means 3.5 cores — and I split it with %usr (the program's own work) versus %system (kernel work on its behalf). A process burning most of its time in %system is doing something syscall-heavy, often I/O or locking.

The column people overlook is %wait — time the process was runnable but couldn't get a core, i.e. stuck in the scheduler's run queue behind everyone else. High %wait with low %CPU is the fingerprint of CPU contention: your program isn't slow, it's starved — the box is oversubscribed and it's waiting its turn. That single insight resolves a huge fraction of "the app is slow but the CPU looks fine" mysteries.

For memory (-r) I watch RSS (real RAM) climbing and majflt/s (major faults — page-ins from disk); steady major faults mean the process is swapping and about to crawl. For disk (-d) I read kB_wr/s and the iodelay column to catch an I/O-heavy process red-handed.

The Columns, Explained

pidstat shows different columns per mode. The ones you'll meet most:

Common to every mode:

  • UID — the user ID owning the process. Add -U to print names instead of numbers.
  • PID — the process ID — the number you'd feed kill or -p.
  • Command — the program name. Add -l to see the full command line with arguments.
  • CPU — which core the process was last running on (handy for spotting a process bouncing across cores).

CPU mode (-u, the default):

  • %usr — time running the program's own code (user space).
  • %system — time the kernel spent on its behalf (syscalls, I/O setup).
  • %guest — time running a virtual CPU (only for hypervisor processes).
  • %wait — runnable but waiting for a free core — the contention signal.
  • %CPU — total CPU share, summed across cores (can exceed 100%).

Memory mode (-r):

  • minflt/s — minor faults per second: pages mapped without touching disk (cheap, normal).
  • majflt/smajor faults: page-ins from disk — expensive, and a sign of memory pressure or swapping.
  • VSZ — virtual memory size (mapped, mostly not real — like VIRT in top).
  • RSS — resident set size: the actual RAM in use. The number that matters.
  • %MEMRSS as a share of physical RAM.

Disk mode (-d):

  • kB_rd/s / kB_wr/s — kilobytes read / written per second by this process.
  • kB_ccwr/s — writes the process cancelled (dirtied then freed before flush).
  • iodelay — clock ticks the process was blocked waiting on block I/O. A great single-number "is this process disk-bound" gauge.

Reading It by Example

%CPU 95, %usr 90, %wait ~0. The process is genuinely CPU-bound on its own work and getting all the core time it asks for. If it's too slow, it needs faster code or more parallelism — the machine is cooperating.

%CPU 30, %wait 60. The tell-tale of contention. The process wants to run but spends most of its life queued behind other work — the box is oversubscribed. Adding threads won't help; reducing competing load (or moving to a bigger box) will. People misread this as "the app is fine, CPU isn't even maxed" and chase the wrong thing for hours.

-r mode: RSS climbing steadily, never falling, majflt/s rising. A memory leak heading toward the OOM killer. Catch it here, before dmesg records the kill.

-d mode: one process with high kB_wr/s and a big iodelay. Found your disk hog — the reason iostat showed the array busy. Now you know which process to throttle, reschedule, or fix.

-t mode: one thread at %CPU 99, siblings idle. The single-threaded bottleneck inside a multithreaded program — invisible to top until you break it out per thread.

Cheat Sheet

  • pidstat 1 — all active processes, CPU, every second
  • pidstat -p PID 1watch one process over time (the core use case)
  • pidstat -r -p PID 1 — memory: RSS and page faults
  • pidstat -d -p PID 1 — disk read/write and iodelay
  • pidstat -t -p PID 1 — break a process down per thread
  • pidstat -w -p PID 1 — context switches per second (voluntary vs involuntary)
  • pidstat -u -r -d -p PID 1 — combine CPU + memory + disk in one view
  • pidstat -l — show the full command line; -h — one tidy line per sample for scripts
  • pidstat -C nginx 1 — filter by command name matching a regex

How You'll Actually Use It

You reach for pidstat the moment a single process becomes the suspect. top told you which program is hot; pidstat -p <pid> 1 tells you the story over time — is it a steady burn or a spike, CPU or disk, its own work or kernel work, running freely or starved by %wait. The killer move is leaving it running with a count (pidstat -u -r -d -p <pid> 1 60) to capture a full minute of CPU, memory, and disk for that one process, then pasting the timestamped output straight into a bug report. For the same data over hours rather than a live capture, its sibling sar records to disk.

Gotchas

  • No interval, no insight. Bare pidstat shows averages since boot. Always add an interval (pidstat 1) for live, meaningful numbers.
  • A process must use the resource to appear. In CPU mode, processes that stayed idle that second are omitted — add -p ALL or look again, don't assume it died.
  • %CPU is summed across cores400 is four full cores, not a bug. Divide by nproc to think in "fraction of the machine."
  • Short-lived processes slip through. Per-second sampling misses a program that lived 50ms — for those, strace or process accounting fit better.
  • It's per-process, not whole-machine. When you want the system totals, use mpstat (CPU) or iostat (disk); pidstat is for attributing that load to a culprit.
  • Not installed by default — part of sysstat.

History & Philosophy

pidstat joined Sebastien Godard's sysstat suite to fill an obvious gap: mpstat and iostat told you the machine was busy, but not who was responsible. By sampling each process's counters in /proc/<pid>/stat and /proc/<pid>/io twice and subtracting, it turns the kernel's per-process bookkeeping into a watchable, recordable stream — bringing the sysstat "sample, subtract, divide" philosophy down to the individual process.

What makes it click is the realisation that a process leaves a measurable trail in /proc for every resource it touches — CPU, faults, bytes to disk, context switches — and pidstat simply reads that trail on a timer. Once you can watch one program's resource fingerprint evolve second by second, debugging stops being guesswork ("I think it's the database?") and becomes evidence ("postgres held 2 cores and wrote 40 MB/s from 5:03:29 to 5:04:11"). That shift, from impression to proof, is the whole point.

Column Reference

The key per-mode output columns, grouped by the flag that turns them on:

Mode Column Meaning
-u %usr Time running the program's own code (user space)
-u %system Time the kernel spent on its behalf
-u %wait Runnable but waiting for a free core — the contention signal
-u %CPU Total CPU share, summed across cores (can exceed 100%)
-u CPU Which core the process was last running on
-r minflt/s Minor faults/s: pages mapped without touching disk
-r majflt/s Major faults/s: page-ins from disk — memory pressure
-r VSZ Virtual memory size (mapped, mostly not real)
-r RSS Resident set size — the actual RAM in use
-r %MEM RSS as a share of physical RAM
-d kB_rd/s Kilobytes read per second by this process
-d kB_wr/s Kilobytes written per second by this process
-d iodelay Clock ticks the process was blocked on block I/O

See Also

  • top — find the hot process, then watch it here over time
  • ps — the one-shot snapshot; pidstat is the time series
  • mpstat — whole-machine per-core view, the system-level counterpart
  • iostat — system disk activity; pidstat -d says which process caused it
  • sar — record the same per-process data over hours
  • strace — when you need the syscalls behind the %system time
  • process — what a PID, thread, and resident set actually are

You shouldn't have to babysit pidstat to know which process is misbehaving.

CleverUptime watches every process on every server and names the culprit in plain language — "java has been holding 2 cores and leaking memory for 20 minutes" — before it takes the box down. See your own server's health in 30 seconds, no signup.