du Command: Tutorial & Examples

Walk every file, add up the bytes — and finally answer the question df can't.

What It Is

du — short for disk usage — is the command you reach for the moment df tells you a filesystem is 95% full and you need to know which directory ate the space. Where df asks the filesystem one quick question ("how much of you have I handed out?"), du does the slow, honest work: it walks the directory tree, stats every file, and adds the bytes up itself. The two are a pair — df for the filesystem view, du for the files view — and when they disagree, the gap is its own diagnosis.

If you've never run a server, this is the third command to learn after top and df. Disks fill up, and once df has told you that /var is full, only du can tell you why — was it /var/log ballooning under a runaway process, a Docker overlay layer, a postfix mail queue with a million tiny messages? We'll explain every flag, teach you the universal "drill down" reflex that veterans use to corner the culprit in four steps, and along the way pick up how Linux actually stores files — blocks, inodes, sparse files, hardlinks, and why du and df sometimes report wildly different numbers for the same disk.

Your First Look

Always pass -h (human-readable) and almost always -s (summarize) or -d1 (one level deep). Raw du on a big tree prints thousands of lines:

du -sh /var/log
2.4G	/var/log

That's the simplest invocation: one number, one path. To see the breakdown one level down — the move you'll actually use:

du -hd1 /var | sort -h
4.0K	/var/local
4.0K	/var/mail
44K	/var/tmp
48K	/var/spool
3.4M	/var/opt
6.3M	/var/backups
1.3G	/var/cache
2.4G	/var/log
6.6G	/var/lib
11G	/var

One row per immediate child of /var, smallest at the top, the grand total at the bottom. Your eye jumps to the bottom three — /var/cache, /var/log, /var/lib — and you've already found the suspects. That -hd1 | sort -h shape is the single most useful du invocation on earth.

Pro Tip

memorize du -hd1 /path | sort -h — human-readable, one level deep, sorted small-to-large so the biggest entry is the last line your eye reads. It's the universal "where did the bytes go?" probe; every veteran reaches for it before anything else, and you'll type it a thousand times.

How I Use It

Two seconds at the terminal, one mental loop — the drill-down reflex every sysadmin ends up with.

First, I start at the suspect mount. df -h already told me which filesystem is full — say /var. I run du -hd1 /var | sort -h. The biggest entry is at the bottom: 6.6G /var/lib. cd into it, repeat. du -hd1 /var/lib | sort -h — bottom line, 5.1G /var/lib/docker. cd again, repeat. Four or five iterations and I'm standing on the single directory (or file) that's eating the disk — a 4 GB PostgreSQL log, a runaway core dump, a Docker overlay holding three old image layers no container needs. The reflex is so mechanical I do it without thinking; in a few seconds I know exactly where to point rm, logrotate, or docker system prune.

Second — and this is the move that took me years — when df and du disagree, the gap is the answer. df -h /var says 20G used. du -sh /var says 11G. Where are the other 9 GB? Almost always one of three things: (1) a process has a file open that I've already rm'd — the inode is gone from the directory tree (so du can't see it), but the kernel won't free the blocks until the file descriptor closes; lsof +L1 shows them. (2) Something is mounted on top of a directory that already had files in it, hiding them from du but not from the filesystem. (3) Sparse files or hardlinks confuse one side or the other. The discrepancy isn't a bug in either tool — it's literally the diagnostic.

Third reflex worth knowing: du reports allocated blocks, not file contents. A 100-byte file uses 4 KB of disk because that's the block granularity. du --apparent-size shows the bytes the file actually contains. A directory of a million 100-byte files? du -sh reports ~4 GB, du -sh --apparent-size reports 100 MB. Both are true. (And if that million-file directory is what's killing you, you may be out of inodes — see df -i.)

Warning

running du from / without -x crosses mount boundaries — so it walks every NFS share, every tmpfs, every /proc pseudo-file, and on a wedged NFS mount it hangs forever. Always run du -x (one filesystem only) when starting from /, and never include /proc, /sys or /dev in a scan.

The Flags Explained

The flags that actually matter, and what each one buys you:

  • -hhuman-readable (K, M, G, T). The default starting point; nobody reads raw 1K blocks.
  • -ssummarize. One total per argument, no children. The shape for "how big is this one thing?": du -sh /var/log.
  • -d N / --max-depth=N — print totals down to N levels deep. -d1 is the workhorse; -d0 is the same as -s.
  • -a — include every file, not just directories. Combined with sort -h, lets you spot a single giant file inside a sea of small ones.
  • -cgrand total. Adds a total line at the bottom; great for du -shc /var/log/*.gz to sum a glob.
  • -x / --one-file-systemdo not cross mount points. Essential when walking / — otherwise you wander into NFS, tmpfs, /proc and everything else.
  • --apparent-size — report file contents, not allocated blocks. The honest size of sparse files; much smaller than du default on a directory of tiny files (because of 4 KB block rounding).
  • -b — equivalent to --apparent-size --block-size=1. Byte-exact; useful when you actually need the number.
  • -B SIZE / -k / -m — fixed-unit output. -k (KiB), -m (MiB), -B G (GiB). Use these when scripting, not -h — sorting "1.2G" before "900M" alphabetically is wrong.
  • -lcount hardlinks multiple times. By default du charges a hardlinked file once, no matter how many names point at it (matching what's actually on disk); -l charges it once per name. Useful when auditing backups made with cp -al or rsnapshot.
  • -Lfollow symlinks. Off by default, because following symlinks risks double-counting and infinite loops.
  • --exclude=PATTERN — skip paths matching a glob. du -shx --exclude='*.log' /var is a common shape.
  • -t SIZE / --threshold=SIZE — only print entries above (positive) or below (negative) a size. du -hd1 -t 100M /var hides everything under 100 MB.
  • --time — append the mtime of the largest file in each directory. Handy for "what's been written recently into this fat directory?"
  • --inodes — count files, not bytes. The du counterpart to df -i; use this when df -i says you've run out of inodes and you need to find the directory hoarding millions of tiny files.
  • -Sseparate directories. Each directory's total excludes its subdirectories — useful when you want to find the one directory containing a fat file directly, not just a fat subtree.
  • -0 — NUL-terminate output. For piping safely into xargs -0 when filenames contain spaces or newlines.

Reading It by Example

Patterns to build instinct — read the output, know the verdict.

du -hd1 /var | sort -h shows 6.6G /var/lib, everything else tiny — drill in: du -hd1 /var/lib | sort -h. This is the loop. Four or five hops and you're at the file.

du -sh / reports 45G, but df -h / says 60G used — 15 GB unaccounted. Almost certainly a deleted-but-open file: a journald, mysqld, or nginx holding a log that's been rm'd. Run lsof +L1 — the LINK column will show 0 and the SIZE/OFF column will be huge. Restarting the offending service frees the blocks instantly.

du -sh /home/user/maildir is 30 MB but the directory ls -las to half a million entries — you're not out of bytes, you're out of inodes. Check with df -i and du --inodes -d1 /home/user/maildir | sort -n.

du -sh /var/lib/docker reports 40G but the containers only use 5G — old image layers. docker system df confirms; docker system prune -a is the cleanup. Docker's overlay filesystem layers reuse blocks via hardlinks and copy-on-write, so du numbers there are notoriously approximate.

du -sh disk.img says 4.0K, the file ls -lhs as 100G — a sparse file. A VM image or a dd-created blob that's mostly zeros; the filesystem only allocated blocks for the bits actually written. du --apparent-size -h disk.img shows the 100G "logical" size. Both are honest.

du -sh /backups reports 200G, but the disk only has 80G free totalhardlink backups (made with rsnapshot or cp -al). Each snapshot looks independent but shares blocks with the others. du charges each block once total (matching reality); du -l charges it once per snapshot directory and explodes the number.

du on / hangs at /mnt/nfs-backup — dead NFS mount. Same family as the df hang and the load-100 mystery under top. Re-run with -x to stay on one filesystem, or --exclude=/mnt/* to skip mounted shares.

Cheat Sheet

The invocations worth memorizing:

  • du -sh PATH — single total for one path. The "how big is this?" answer.
  • du -hd1 PATH | sort -h — the workhorse. One level deep, sorted, biggest last.
  • du -shx /* — every top-level directory, one filesystem only. Where you start when / is full.
  • du -ahd1 PATH | sort -h | tail — include files, show only the biggest few.
  • du -shx --exclude='*.gz' /var/log — exclude rotated logs.
  • du --apparent-size -sh FILE — true file content size, not allocated blocks.
  • du --inodes -d1 PATH | sort -n — find the directory holding millions of files.
  • du -sh /var/log/*.log | sort -h | tail — biggest log files in one shot.
  • du -shc /backups/* — per-directory totals plus a grand total at the bottom.
  • du -d1 -t 100M PATH — only entries above 100 MB.

How You'll Actually Use It

In real life, du lives in two moments. First, the 3am drill-down — pager fires, df -h says /var is at 100%, you start du -hd1 /var | sort -h, cd to the fat directory, repeat. Four hops, found it, cleaned up, back to sleep. Second, the routine audit — once a week (or via cron) you run du -shx /* on a server to spot what's growing. A /var that's gained 5 GB in seven days is going to bite in three months; better to chase it now.

What du is not for: interactive exploration of huge trees. Once your tree is millions of files deep, du takes minutes per run and you're typing the same incantation over and over. That's when veterans reach for ncdu — the interactive du, a full-screen TUI that runs du once and then lets you arrow-key through the tree. One scan, infinite drill-downs. If you're cleaning up a hopelessly full server, install ncdu first, then thank me.

du also makes a poor monitoring tool — it's slow (it walks every file), it generates real I/O load, and it can hammer a filesystem you were trying to protect. For monitoring disk usage over time, stick to df (cheap, instant); reserve du for investigation after df raises the alarm.

Gotchas

  • du is slow on big trees. It stats every file; on a filesystem with millions of files it can run for minutes and saturate the disk. Don't run it in a tight monitoring loop — that's what df is for.
  • du and df disagree, and that's normal. Three usual causes: deleted-but-open files (lsof +L1), overlay mounts hiding files beneath, and sparse files/hardlinks. The gap is the diagnosis.
  • Block size vs apparent size. Default du shows allocated blocks (rounded up to 4 KB on most ext4); --apparent-size shows file contents. A million 100-byte files is 100 MB apparent / 4 GB on disk.
  • Hardlinks counted once by default. A hardlinked file shows up multiple times in the tree but is charged once — matching the actual on-disk cost. -l overrides that and explodes the number; usually wrong.
  • Crosses mount boundaries without -x. Walks into NFS, tmpfs, /proc, /sys. Always -x from /.
  • No progress bar. On a huge tree du appears to hang; it's not, it's just walking. If you need progress, that's another reason to use ncdu.
  • Different filesystems, different answers. ZFS and Btrfs deduplicate and snapshot; du numbers there are approximate — trust the filesystem-native tool (zfs list, btrfs fi du) for the truth.
  • "Permission denied" on subdirectories. du silently undercounts when it can't read a directory. Run with sudo (or pipe stderr to /dev/null and accept the gap) for an honest total.

History & Philosophy

du shipped with the very first edition of UNIX in 1971 — same vintage as df, ps, and ls. Its job hasn't changed in fifty-five years: walk a directory, stat every entry, add up the blocks, print the totals. No magic, no kernel help — just the same readdir and stat system calls any program could make.

The split between df and du is one of the most elegant divisions of labor in the UNIX toolbox, and it tells you something deep about how filesystems work. df asks the filesystem ("how many blocks have I handed out from my superblock?") — instant, cheap, a single statvfs(2) call per mount. du walks the directory tree the user can see — slow, expensive, dependent on what's actually linked into the tree. When the two answers match, life is simple. When they don't, you're learning something true about your system: a file open without a name, a mount over a populated directory, a sparse file the kernel optimized away. The discrepancy is the filesystem revealing its internals.

That's why du is on every UNIX, every Linux, every BSD, every macOS, identical enough that a script written in 1985 still runs today. The newer filesystems bend the rules in interesting ways — Btrfs copy-on-write snapshots, ZFS dedup, overlay layers — and du's answers there become approximate. But the abstraction is so clean it absorbs the weirdness, and du -hd1 | sort -h still gives you the right answer for the question you actually asked: where did all the space go?

See Also

  • df — the filesystem view; the paired command, always run first
  • lsof — find deleted-but-still-open files when du and df disagree
  • ncdu — the interactive du, a full-screen TUI for drilling down
  • ls — single-file sizes; ls -lhS to sort by size
  • find — hunt files by size, age, or name across a tree
  • sort — pair with du -h via sort -h (human-numeric)
  • mount — see what's mounted where; explains the -x flag
  • /var — the directory most likely to fill up
  • /tmp — the runner-up
  • /home — user files; du -shx /home/* is a weekly audit shape
  • filesystem — the abstraction du walks
  • inode — the per-file entry; du --inodes counts them
  • block device — the unit du actually counts
  • hard link — why du charges shared files once
  • disk full — the full diagnose-and-fix walkthrough

Disk full at 3am and no idea which directory is the culprit?

CleverUptime watches every filesystem on every server every minute and, when one fills, runs the du drill-down for you — telling you in plain language which directory grew, by how much, and how fast — so you wake up knowing exactly where to look.

Want to see your own server's health right now? One command, no signup, no install.

Check your server →