sysstat: Tutorial & Best Practices

The package that turns your server's vital signs into a recording you can rewind.

What It Is

sysstat is the quiet box of tools every serious Linux admin reaches for the moment "the server was slow an hour ago" needs an answer instead of a shrug. It's not one program — it's a small family, and you've almost certainly already met one of them without knowing they were siblings: iostat, mpstat, pidstat, and the patriarch of the clan, sar. Install one, you get them all, because they all ship inside the single Debian/RPM package called sysstat.

Here's the thing that makes this package special, and it's worth saying up front because it reframes everything below. Tools like top and free show you this instant — gorgeous, live, and gone the moment you look away. sysstat does something none of them do: it remembers. A tiny collector wakes up every few minutes, reads the same kernel counters top reads, and writes them to disk. Weeks later you can ask it, in plain words, "what was this machine doing at 3am last Tuesday?" — and it will tell you. That's the whole game, and we'll come back to it, because it's one of the most useful things a server can do for you.

If you've never run a server before, this is the page where monitoring stops being "stare at a screen and hope you catch it" and becomes "the box took notes while you slept." We'll meet every member of the family, learn to read their output column by column, set up the time machine properly, and — the part most tutorials skip entirely — understand the little daemon doing the recording underneath. By the end you'll reach for the right tool by reflex.

Your First Run

You don't need to configure anything to use these tools live — they'll sample the kernel on a timer and report back. The grammar is the same across the whole family: an interval and a count.

The classic first command is iostat with extended stats, sampling once a second, twice:

iostat -x 1 2

Linux 6.12.86+deb13-amd64 (xps) 	06/03/2026 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.95    0.36    1.54    0.16    0.00   94.00

Device    r/s   rkB/s  rrqm/s %rrqm r_await rareq-sz   w/s   wkB/s w_await wareq-sz  aqu-sz  %util
nvme0n1 17.80  721.74    2.25 11.23    0.19    40.54 201.29 1915.03    1.71     9.51    0.35   0.85

(I've trimmed a few columns sideways so it fits — the real output is wider.) Two things to notice. First report, then the rest are deltas. The very first block iostat prints is the average since boot — usually meaningless, often misleadingly calm; ignore it and read the second one, which covers your live one-second window. Every tool in the family does this, so train the habit now: the first sample is a warm-up, the truth starts at the second. That single fact saves more confused debugging sessions than any flag.

How The Family Divides The Work

The beauty of sysstat is that each tool answers one question cleanly, and once you know which is which, you never reach for the wrong one. Picture the machine's resources as four rooms, each with its own specialist:

mpstat — the CPU specialist. Not the whole CPU, but every core individually. Where top's top line averages all your cores into one blurry number, mpstat -P ALL shows you each one. This is how you catch the single most common scaling mistake there is, and we'll do exactly that below.
iostat — the storage specialist. Every block device: how busy each disk is, how long requests wait, whether the disk is the bottleneck. When top shows high %iowait and you need to know which disk and how bad, this is the tool.
pidstat — the per-process specialist. Like a top that you can sample over time and log — CPU, memory, and crucially disk I/O per process, which almost nothing else gives you cleanly.
sar — the historian. The one that reads the recordings. Everything the others show live, sar shows you for any moment in the past. This is the headline act.

There are a few rarer cousins too — tapestat (tape drives, yes, people still run them), cifsiostat (Windows/SMB network shares) — but the four above are the working set. Let's read each one properly.

mpstat: One Line Per Core, and the Trap It Reveals

Run it across all CPUs:

mpstat -P ALL 1 1

11:43:41 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %idle
11:43:42 PM  all    5.81    0.00    2.53    0.13    0.00    0.13    0.00   91.41
11:43:42 PM    0    3.03    0.00    6.06    0.00    0.00    0.00    0.00   90.91
11:43:42 PM    1    6.06    0.00    3.03    0.00    0.00    0.00    0.00   90.91
11:43:42 PM    2    5.88    0.00    1.96    0.00    0.00    0.00    0.00   92.16
11:43:42 PM    3    5.00    0.00    3.00    0.00    0.00    1.00    0.00   91.00
11:43:42 PM    4   10.31    0.00    2.06    1.03    0.00    0.00    0.00   86.60
11:43:42 PM    5    6.06    0.00    2.02    0.00    0.00    0.00    0.00   91.92
11:43:42 PM    6    5.15    0.00    1.03    0.00    0.00    0.00    0.00   93.81
11:43:42 PM    7    5.05    0.00    1.01    0.00    0.00    0.00    0.00   93.94

The columns are the same time-buckets you met on top's %Cpu(s) line — %usr is your code, %sys is the kernel working on your behalf, %iowait is the core stalled waiting on a disk, %steal is the hypervisor handing your virtual CPU to a noisy neighbour on a cloud box, and %idle is spare capacity. The all row is the average; the numbered rows are the magic.

Here's why this view earns its keep. Imagine mpstat showing CPU 3 pinned at %idle 0.00 while the other seven sit near 90% idle. The all average would read a sleepy ~12% busy and you'd conclude the box is bored. It isn't — one core is on fire and the rest are asleep. That's the unmistakable fingerprint of a single-threaded program: it can only ever use one core no matter how many you buy it. People respond to this by renting a bigger server and are baffled when nothing improves — they just bought seven more idle cores. mpstat -P ALL is the tool that shows you the trap in one screen, where the averaged view actively hides it.

Pro Tip

mpstat -P ALL 2 (interval, no count) runs forever, refreshing every two seconds — the fastest way to watch whether load is spread evenly or trapped on one core while you reproduce a slow request.

iostat: Is the Disk the Bottleneck?

The summary block is CPU (same buckets as above). The interesting half is the per-device table, and the -x flag is the one you always want — it adds the columns that actually answer questions:

Device    r/s   rkB/s   r_await   w/s   wkB/s  w_await  aqu-sz  %util
nvme0n1 17.80  721.74      0.19 201.29 1915.03    1.71    0.35   0.85

Reading left to right, the ones that matter:

r/s / w/s — reads and writes completed per second (IOPS). Raw busyness.
rkB/s / wkB/s — throughput in kilobytes per second. A disk doing 200 small writes/sec and one doing 200 huge ones look identical in w/s but wildly different here.
r_await / w_await — the single most important numbers on the line. Average milliseconds a request waited, queue plus service time. This is latency, the thing your users actually feel. An NVMe SSD answering in 0.19 ms is healthy; the same number at 50 ms means the disk is gasping. A slow spinning disk under load can show hundreds.
aqu-sz — average queue depth. How many requests were stacked up waiting. Climbing queue depth + climbing await = the disk is the bottleneck, full stop.
%util — percentage of time the device had at least one request in flight. Here's the gotcha that catches everyone: on old single-platter spinning disks, %util near 100% genuinely meant "saturated." On modern SSDs and NVMe drives, which serve many requests in parallel, %util can read 100% while the disk is loafing — it only means "never idle," not "maxed out." So on anything modern, trust await and aqu-sz, not %util. (rrqm/s / wrqm/s and %rrqm / %wrqm, if you see them, count requests the kernel merged together before sending — high merge rates are the kernel being clever on your behalf.)

So the whole tool reduces to a reflex: high await and a growing aqu-sz mean storage is your bottleneck. That's the moment to stop blaming the CPU. (BTW, iostat -xz 1 adds -z to omit idle devices — on a box with a dozen LVM volumes and loopback mounts, that one letter is the difference between a readable screen and a wall of zeros.)

pidstat: top, But With a Memory and a Disk View

pidstat is the family member people discover last and then can't live without. It's a per-process sampler you can run over an interval and log, which makes it perfect for catching a misbehaving process in the act rather than hoping you're staring at top at the right second.

The headline trick is -d, disk I/O per process — a question top simply cannot answer:

pidstat -d 1 1

11:43:55 PM   UID    PID  kB_rd/s  kB_wr/s kB_ccwr/s iodelay  Command
11:43:56 PM  1000 1525176     0.00    62.75     0.00       0  syncthing
11:43:56 PM  1000 2015928     0.00   149.02   137.25       0  vivaldi-bin
11:43:56 PM  1000 1983004     0.00    15.69     0.00       0  claude

kB_rd/s and kB_wr/s are how much each process read and wrote to disk per second. The instant iostat tells you the disk is hammered, pidstat -d tells you who is doing the hammering — here a sync client writing 63 kB/s and a browser writing 149. That handoff — iostat for how bad, pidstat -d for who — is one of the cleanest one-two punches in Linux diagnostics.

The columns worth knowing across pidstat's modes:

default / -u — %usr, %system, %CPU, and CPU (which core it last ran on).
%wait — time the process was runnable but waiting for a CPU — i.e. starved by other work, not by itself. A high %wait means the box is oversubscribed.
-r — memory: minflt/s (minor page faults, cheap), majflt/s (major faults — pages fetched from disk; a steady stream means you're swapping and that hurts), VSZ, RSS, %MEM.
-d — disk, above. iodelay counts clock ticks the process was blocked on I/O.
-w — context switches per second (a process switching thousands of times a second is often lock-contended).
-t — break a process down into its individual threads.
-p PID — watch one specific process by ID; -C name filters by command name.

pidstat -druh -p ALL 1 is a power user's dashboard: CPU, disk, memory, all processes, human-readable, once a second. It's the closest thing the family has to a logging top.

sar: The Time Machine

Now the headline act, and the reason sysstat is on this list at all.

Everything above is live — run it now, see now. But machines misbehave at the worst times: the 4am cron job that pins every core, the nightly backup that floods the disk, the slow memory leak that only OOMs after eleven days. You are asleep for all of these. sar is the tool that watched while you slept.

Here's the genuinely lovely part, the one that makes the whole package click. None of this data is special. It's the exact same kernel counters top and free read live, the ones living in /proc — cat /proc/stat for the CPU figures, /proc/diskstats for the disk. The kernel publishes the entire live state of the machine as plain numbers, constantly. The trouble is they're instantaneous — read them, they're gone. All sysstat does is set an alarm clock: every few minutes a small collector reads those same /proc files and appends the numbers to a file on disk. That's it. That's the whole magic. There's no kernel wizardry, no special instrumentation — just a tireless little robot taking a snapshot of /proc on a timer and never throwing the snapshots away. sar is what reads the album back.

You can use sar live, no history required — it'll sample like its siblings:

sar -u 1 2

11:44:10 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:44:11 PM     all      2.83      0.00      1.54      0.00      0.00     95.62
11:44:12 PM     all      5.53      0.00      1.16      0.26      0.00     93.06
Average:        all      4.18      0.00      1.35      0.13      0.00     94.34

But that's not why it exists. The real move is reading yesterday's recording. Once collection is on (next section), the data lands in /var/log/sysstat/, one file per day named saDD — sa16 is the 16th of the month. To pull, say, the disk history out of the 16th:

sar -d -f /var/log/sysstat/sa16

…and sar prints the disk stats for every ten-minute slot of that whole day. Want only the window the incident happened in? Add -s (start) and -e (end):

sar -u -s 02:00:00 -e 04:00:00 -f /var/log/sysstat/sa16

There it is — "what was the CPU doing between 2 and 4am last Tuesday?" — answered, in numbers, after the fact. The first time a customer says "it was unbearable around 3am" and you simply look it up and reply "yes — your nightly mysqldump pinned all eight cores from 02:50 to 03:20, here's the graph," you'll understand why old admins guard their sar data like treasure.

The collection covers far more than CPU. The keyword flags pick the subsystem:

-u CPU · -u ALL every CPU bucket · -P ALL per-core (works on history too!)
-r memory · -S swap usage · -W swapping rate · -B paging
-b I/O totals · -d per-disk · -F filesystems (mounted space)
-n DEV network interfaces · -n TCP,ETCP TCP stats · -n EDEV interface errors
-q load average and run-queue · -w task creation · -v kernel tables (inodes, file handles)
-A everything — the firehose, every metric the file holds

One flag deserves a spotlight: -x (not the disk -x from iostat — here it means "extended") adds minimum and maximum columns alongside the average, so a ten-minute slot that averaged a calm 30% CPU but spiked to 100% for thirty seconds no longer hides behind its own average. Averages lie; sar -x makes them confess.

Turning On the Recording

Here's the trap that bites everyone, and we'll demonstrate it honestly because it's true on the very machine this page was written on. Installing sysstat does not start the recording. Ask sar for history before enabling collection and you get:

Cannot open /var/log/sysstat/sa03: No such file or directory
Please check if data collecting is enabled

That message means exactly what it says: the time machine is installed but switched off, and right now it's recording nothing. On Debian/Ubuntu the master switch is a file:

# /etc/default/sysstat
ENABLED="false"     # <-- change to "true", then restart the service

Flip it to "true" and sudo systemctl restart sysstat. (On RPM-based systems you instead systemctl enable --now sysstat, and the equivalent file is /etc/sysconfig/sysstat.) From that moment on, the recording runs.

What actually does the recording is a cron job dropped in at install time. Worth reading, because it demystifies the whole thing:

# Activity reports every 10 minutes everyday
5-55/10 * * * * root command -v debian-sa1 > /dev/null && debian-sa1 1 1

# Additional run at 23:59 to rotate the statistics file
59 23 * * * root command -v debian-sa1 > /dev/null && debian-sa1 60 2

Two roles hide behind two scripts. sa1 is the collector — every ten minutes it runs the real worker, sadc (the "system activity data collector," the only piece that actually touches /proc), and appends a fresh sample to today's saDD file. sa2 (run once daily) is the summariser — it writes a human-readable text report and prunes old data. So sadc writes the binary album, sa1 schedules the snapshots, sa2 tidies up, and sar/sadf read it all back. Four small pieces, one elegant machine.

Note

By default sysstat keeps about 28 days of history (HISTORY=28 in /etc/sysstat/sysstat). On a box where you want to compare this month's load to last quarter's, raise it — the binary files are tiny, a few MB a day. Cheapest insurance you'll ever buy.

The Hidden Power: sar Data Becomes Graphs and JSON

This is the trick almost nobody knows, and it's a delight. The companion tool sadf reads the same binary saDD files sar reads — but instead of a text table, it can transform them. Two outputs are genuinely magic:

A standalone SVG graph, from the command line, no Grafana, no database, no agent:

sadf -g -- -r -n DEV /var/log/sysstat/sa16 > memory-and-network.svg

-g tells sadf to render Scalable Vector Graphics; the part after -- is a normal sar query (-r memory, -n DEV network). Open that .svg in a browser and you have a real, zoomable, time-axis chart of yesterday's memory and network — generated by a tool that's been sitting on the server the whole time. The poor admin's monitoring dashboard, built into a 30-year-old package.

Clean JSON, for feeding anything:

sadf -j -- -u -r /var/log/sysstat/sa16

-j emits the recorded metrics as structured JSON — pipe it into jq, a script, or your own dashboard. (sadf -d gives CSV-ish output for a spreadsheet, -l exports to PCP.) The lesson worth carrying: sar is for your eyes, sadf is for everything else — graphs for a report, JSON for a pipeline, all from the identical recording.

How I'm Using It

The routine on a fresh server, in order:

1. Turn the recorder on, day one. The single highest-leverage minute: flip ENABLED="true", restart sysstat, and forget about it. History you didn't start collecting is history you can never get back — the one regret this tool guarantees if you skip it. I do this before the box ever sees real traffic, so when the first mystery slowdown hits weeks later, the answer is already on disk.

2. Live triage, the family in sequence. When something's wrong now: mpstat -P ALL 2 first — is the load spread, or trapped on one core? If the cores look fine but the box drags, iostat -xz 2 — is a disk's await blown out? If a disk is the problem, pidstat -d 1 — who's writing? Three commands, three rooms, and the culprit is cornered.

3. The after-the-fact autopsy. "It was slow at 3am." sar -u -s 02:30 -e 03:30 for the CPU, sar -d and sar -r for disk and memory across the same window, sar -q for the load average. Five minutes and the ghost has a name.

4. The monthly look-back. Once a month, sar -u -f across a few recent days to eyeball the trend. A machine that crept from 40% to 70% average CPU over a quarter is telling you to size up before it falls over — the kind of foresight you only get from a tool that remembers.

Cheat Sheet

The grammar everywhere: tool [flags] [interval] [count]. No count = run forever.

iostat -xz 1 — extended disk stats, idle devices hidden, live. The disk-bottleneck command.
mpstat -P ALL 1 — every core separately. The single-threaded-trap detector.
pidstat -d 1 — disk I/O per process. Who's hammering the disk.
pidstat -ru 1 — CPU + memory per process, sampled over time.
sar -u -f /var/log/sysstat/sa16 — replay a past day's CPU.
sar -u -s 02:00 -e 04:00 — narrow a query to an incident window.
sar -A — every recorded metric (the firehose).
sar -x ... — add min/max columns so spikes can't hide in an average.
sadf -g -- -r ... > out.svg — render history as an SVG chart.
sadf -j -- -u ... — export history as JSON.
Remember: the first sample is always since-boot — read from the second.

Gotchas

Installed ≠ recording. The sysstat package ships with collection off on Debian/Ubuntu (ENABLED="false"). No history accrues until you flip it and restart the service. Check the day you install it, not the day you need it.
The first report is a lie of omission. Every tool's first block is the average since boot — usually irrelevant. The real numbers start at the second sample. (iostat lets you skip it: iostat -y omits the boot summary entirely.)
%util is a trap on SSDs. It means "never idle," not "saturated." Modern drives parallelise; trust await and aqu-sz instead.
Reading the wrong day's file. saDD is the day-of-month, so sa16 is overwritten next month on the 16th — there's only ever ~28 days of saDD files unless you set -D (which switches to saYYYYMMDD and never collides). For long retention, bump HISTORY and consider SA_DIR.
The clock in old data is recorded in the timezone the box used then. Cross a daylight-saving boundary and sar -t shows the file's original local time; without it, your current zone. A subtle one when chasing a timestamp across a time change.

History & Philosophy

sysstat is the work of one person — Sébastien Godard — maintained, remarkably, since 1999, and his name is still printed in the version banner of every tool in the suite. That longevity is itself the lesson: monitoring is one of those problems where the boring, reliable answer beats the shiny one, and a tool that has quietly done the same honest job for a quarter-century earns a trust no dashboard-of-the-month can.

The deeper idea worth taking away is the one that connects this page to top, to free, to everything you'll read about Linux internals: the kernel was always keeping score. Every counter sysstat records already existed, ticking away in /proc, free for the reading — cat /proc/stat and watch the CPU numbers climb. The genius of sysstat wasn't inventing new measurements; it was the almost childishly simple insight that if you just write the numbers down on a timer, the present becomes the past you can revisit. Most of computing's best ideas are like that — not clever, just kept. Once you see that a running Linux box is constantly narrating itself in plain numbers, and that all sysstat does is take dictation, the machine stops being a black box and becomes something with a diary you're welcome to read. Pull that thread and you end up reading /proc directly for fun, which is exactly the rabbit hole worth falling into next.