Out of Memory: Symptoms, Diagnosis & Fixes

When the kernel runs out of memory it doesn't crash — it picks a process and shoots it. The trick is knowing why, and which.

What It Is

Out of memory — OOM — is the moment the kernel wants to hand out a page of RAM and discovers there is none left to give. Not "running low." None. It has already squeezed every reclaimable page it can find, pushed what it could to swap, thrown away cached file data, and it still can't satisfy the next allocation. At that point the kernel does the one thing it has left: it invokes the OOM killer, scores every process by how much memory it's hogging, picks the fattest plausible victim, and sends it a SIGKILL. No warning to the process, no chance to clean up, no core dump unless you asked for one in advance. One moment your database is serving queries; the next it's a single line in the kernel log and a connection refused on every client.

So let's get the most important reframe out of the way first, because it changes how you read this whole page: the OOM killer is not the disease. It's the immune response. A server that fires the OOM killer was already out of memory — the kill is the kernel triaging a hopeless situation to keep the rest of the box alive. Cursing the OOM killer for killing your process is like cursing the smoke alarm for the fire. The real question this page answers is never "how do I stop the OOM killer" (you mostly don't, and you mostly shouldn't want to) — it's "what ate all the memory, and how do I make sure the right thing — or nothing — dies next time?"

This is also one of the most misdiagnosed problems on a server, because the symptom and the cause are separated in time and place. The process that dies is frequently not the process that's guilty — the kernel kills the biggest current memory user, which on a box where a leaky worker slowly ate everything might be your perfectly-innocent database, sitting there fat and obvious while the actual culprit hides in the noise. By the end of this page you'll read the kernel's OOM report line by line, tell a global OOM (the whole machine ran dry) from a cgroup OOM (one service hit its own limit) at a glance, understand oom_score and overcommit well enough to predict who dies, and know how to protect the process that must never be the victim. We'll start where it hurts — spotting it, reading it, fixing it — and save the audacious why (how Linux hands out memory it doesn't have, and gets away with it) for the end.

How You Notice

An OOM kill is loud in the kernel log and silent everywhere a human is looking, which is exactly why it surprises people. Here's each place it surfaces, with the command to see it on your own box right now.

  • A process vanished and nobody knows why. The signature symptom. A service that was up is suddenly down, its own logs just stop mid-sentence (it never got to log its own death — SIGKILL doesn't knock), and systemctl status shows it exited with no clean shutdown. The first instinct is "it crashed," but a crash leaves a trace; an OOM kill leaves the trace in the kernel's log, not the app's. Always check there:

    journalctl -k -b | grep -iE "out of memory|oom-killer|killed process"
    dmesg -T | grep -iE "out of memory|oom"
    

    A line like Out of memory: Killed process 1234 (mysqld) is the smoking gun — the kernel naming the exact PID and program it shot. An empty result here means whatever happened, it wasn't an OOM kill, and you can stop looking down this road.

  • The OOM kill counter has moved. The cleanest, most honest symptom there is — and the one CleverUptime watches — because it survives even when the kernel log has rotated away. Since kernel 4.13 the kernel keeps a lifetime tally of every process the OOM killer has reaped, in /proc/vmstat:

    grep oom_kill /proc/vmstat
    
    oom_kill 4
    

    That 4 is not a rate or a guess — it's the count of processes this kernel has killed for memory since boot, full stop. Zero is the calm reading. A number that's higher than it was an hour ago means a kill just happened, even if the log that explained it is long gone. (This is the rawest, most reliable OOM signal on the whole box — a single integer that never lies.)

  • Everything ground to a crawl just before it. OOM is usually the finale of a slower tragedy: as memory runs out the kernel fights to stay alive by paging anonymous memory to swap and dropping the page cache, and the box spends more time shuffling pages than doing work — swap thrashing. In top you'll see free memory near zero, swap filling, and the si/so (swap-in/swap-out) columns of vmstat 1 hammering:

    vmstat 1
    

    When si/so are pinned high and free is near zero, you're watching the run-up to an OOM kill in real time. (Sustained, that whole picture is swap thrashing; when there's no swap left to thrash, OOM comes faster, not slower — see swap full.)

  • A container or service that keeps restarting. On a modern systemd or container host, the kill often isn't the whole machine running out — it's one cgroup hitting its own MemoryMax ceiling. The service gets OOM-killed, systemd (or the orchestrator) restarts it, it grows back into its limit, and gets killed again — a restart loop with a memory smell. The kernel log says Memory cgroup out of memory rather than the global form, and that one word — cgroup — completely changes the diagnosis. More on that split below.

Any one of these means the same first move: read the kernel's OOM report. It is one of the most detailed post-mortems Linux ever writes for you — it dumps the entire memory situation and a table of every process at the moment of death — and learning to read it is the whole skill.

How I Read It

When the OOM killer fires, the kernel doesn't just log a one-liner — it writes a small forensic report: why it ran out, what the memory looked like, every process that was running, and who it chose to kill. Most people see the wall of hex and scroll past to the last line. That's backwards. The last line tells you the victim; the report tells you the cause. Here's a real one, trimmed of a few of the more verbose blocks, from a box called db-prod whose MySQL got reaped:

mysqld invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
CPU: 2 PID: 1234 Comm: mysqld Not tainted 5.15.0-91-generic #101-Ubuntu
Mem-Info:
active_anon:1923004 inactive_anon:8231 isolated_anon:0
 active_file:142 inactive_file:88 isolated_file:0
 unevictable:0 dirty:5 writeback:0
 slab_reclaimable:18422 slab_unreclaimable:24109
 free:21344 free_pcp:118 free_cma:0
Node 0 active_anon:7692016kB inactive_anon:32924kB active_file:568kB inactive_file:352kB
Node 0 Normal free:85376kB min:84320kB low:105400kB high:126480kB
0 pages HighMem/MovableOnly
0 pages reserved
Tasks state (memory values in pages):
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[    412]     0   412     5921      442    73728        0             0 systemd-journ
[    701]     0   701    20374      311   126976        0         -1000 sshd
[    988]   111   988   227793     1108   286720      512             0 mysqld_safe
[   1234]   111  1234  2095104  1894221 15904768     8044             0 mysqld
[   1567]    33  1567    61204      904   151552       12             0 nginx
Out of memory: Killed process 1234 (mysqld) total-vm:8380416kB, anon-rss:7576884kB, file-rss:0kB, shmem-rss:0kB, UID:111 pgtables:15904768B oom_score_adj:0

It looks like a hex dump had a fight with a spreadsheet, but it's really four blocks, and you read them in a deliberate order. Let's take them apart — and, like the disk report, the most useful habit is to not start at the screaming last line.

Block 1 — Who Pulled the Trigger (and Why)

mysqld invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

The first surprise: the process that invoked the OOM killer is usually not the cause — it's just whoever happened to ask for the page of memory that finally couldn't be served. Here mysqld asked for one more page, there were none, and the act of asking triggered the killer. (It's a coincidence of timing, not guilt; on a leaking box the trigger is often something blameless that allocates constantly, like the logger or the shell.) Two fields earn a glance: order=0 means it wanted a single page (4 KB) — the most basic possible request, and the fact that even that couldn't be met is how you know memory is genuinely, completely gone. (order=2 or higher would mean it needed a contiguous run of pages, a rarer and different failure — memory fragmentation — where free memory exists but not in a big enough unbroken block.) And oom_score_adj=0 is the trigger's own kill-priority, which we're about to meet properly.

Block 2 — The Memory State Dump

Mem-Info:
active_anon:1923004 inactive_anon:8231 ...
 free:21344 free_pcp:118 free_cma:0
Node 0 Normal free:85376kB min:84320kB low:105400kB high:126480kB

This is the kernel showing its work — proof that it really had no other option. The number that tells the whole story is active_anon: anonymous memory is the kind that isn't backed by a file on disk — program heaps, stacks, the actual working data of your processes — and the brutal fact about it is that it can't just be dropped. File-backed cache can be thrown away and re-read later; anonymous memory has nowhere to go except swap, and once swap is full it has nowhere to go at all. Here active_anon is ~1.9 million pages (≈7.5 GB) while active_file is a rounding error — almost all the memory is anonymous and un-droppable. That's a box that filled up with live program data, not cache, which is exactly the situation the OOM killer exists for.

The free / min / low / high watermarks are the kernel's tripwires. As free memory falls past low it wakes kswapd to reclaim in the background; past min it reclaims synchronously, blocking allocations until it succeeds; and when even synchronous reclaim can't get above min, the OOM killer is the last resort. Seeing free parked right at min (here 85376kB against a min of 84320kB) is the kernel pinned against the floor with nothing left to reclaim — the literal definition of out of memory.

Block 3 — The Process Table (the part you actually use)

Tasks state (memory values in pages):
[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[   1234]   111  1234  2095104  1894221 15904768     8044             0 mysqld
[    701]     0   701    20374      311   126976        0         -1000 sshd

This is the gold, and almost nobody reads it. At the instant of the kill the kernel dumps every process on the system with its memory footprint, so you can see exactly who was eating what — even processes that weren't killed. The column that matters is rss (resident set size), in pages — multiply by 4096 for bytes. mysqld at 1894221 pages is ≈7.2 GB resident, towering over everything else: the obvious biggest user, and the chosen victim. total_vm is its virtual size (what it has mapped, much of it never touched — virtual memory is cheap, which is the entire reason overcommit exists, see the deep dive). And look at the last column: sshd carries oom_score_adj of -1000 — the value that makes a process unkillable by the OOM killer, which is why a sane distro ships it that way: whatever else dies, you keep your way in to fix it.

Reading this table is how you catch the misdiagnosis trap. If the killed process is huge and the table shows it ballooning over time, it's the culprit. But if the victim is a steady-state service and some other row is suspiciously fat — a worker, a script, a runaway job — the kernel killed the biggest current user while the real leaker hid just under it. The table is the only place that distinction is visible after the fact.

Block 4 — The Verdict

Out of memory: Killed process 1234 (mysqld) total-vm:8380416kB, anon-rss:7576884kB, file-rss:0kB, shmem-rss:0kB, UID:111 pgtables:15904768B oom_score_adj:0

Now the famous line, and you read it last because you already know the story. It names the victim (PID 1234, mysqld, owned by UID:111) and breaks down its memory at death: total-vm (8.4 GB mapped), anon-rss (7.6 GB of un-droppable anonymous RAM — the number that got it killed), file-rss and shmem-rss (file-backed and shared-memory resident pages, both zero here). anon-rss is the one to read: it's the portion of the victim's memory that was real, resident, and impossible to reclaim any way other than killing the process. On a database that had quietly grown its buffer pool past what the box could hold, that single number is the whole post-mortem.

Note

The kernel writes this entire report to the kernel ring buffer, which is why dmesg and journalctl -k can show it — but the ring buffer is finite and overwrites itself. On a box that's been up a while, or that OOM'd repeatedly, the report explaining an earlier kill may already be gone while the oom_kill counter in /proc/vmstat still remembers it happened. That's the difference between knowing that a kill occurred (the counter, permanent until reboot) and knowing why (the log, ephemeral). Capture the log while it's fresh.

Reading It by Example

Train the pattern-match. The readout on the left, what I'd actually conclude on the right:

  • grep oom_kill /proc/vmstat shows oom_kill 0 → No OOM kill has happened since boot. Whatever's wrong, it isn't this. The happy, and most common, reading.
  • oom_kill higher than your last check, log shows Killed process … (mysqld), and the table shows mysqld rss dwarfing everything → A genuine memory shortage; the database (or whatever) really is the biggest user. Either it's legitimately sized too big for the box, or it's leaking. Read its config and its memory trend before you blame the kernel.
  • Victim is a steady service, but the process table shows a different row that's enormous → The classic misdiagnosis. The kernel killed the biggest current user; the real culprit is the fat row that didn't die. Hunt that one.
  • Log says Memory cgroup out of memory: Killed process … → Not a global OOM. One cgroup — a systemd service with MemoryMax, or a container with a memory limit — hit its own ceiling. The rest of the machine has plenty of RAM. The fix is the limit or the service, not the box. (Details below.)
  • oom_score_adj=900 (or any large positive number) on the victim → Something deliberately marked that process as first-to-die (containers and some daemons do this on purpose). It wasn't necessarily the biggest user — it was the most expendable. Working as intended.
  • OOM kills happening with free -h showing gigabytes "free" → Almost always overcommit plus a huge single allocation, or an order higher than 0 (fragmentation): the total is fine but a single contiguous request couldn't be met. Check the order= field in the trigger line.
  • Repeated OOM kills minutes apart, same process each time → A leak or a restart loop. The service grows back into the wall every time it's restarted. Cap it or fix the leak; restarting it unchanged just schedules the next kill.

How to Fix It

The right move depends entirely on what the report told you — a leak, an oversized config, a too-tight cgroup limit, or a box that's genuinely too small. But the very first decision is always the same, and getting it wrong is how people make an outage worse.

Danger

Do not "fix" OOM by disabling the OOM killer, setting vm.overcommit_memory=2 blindly, or slapping oom_score_adj=-1000 on your application to make it immortal. All three feel like solutions and all three are traps. An immortal process doesn't stop being out of memory — it just forces the kernel to kill something else, often sshd or systemd itself, turning a single dead service into an unreachable or fully-wedged machine that needs a hard reboot. The OOM killer killing the right process is the system working; making it kill the wrong one is how a recoverable incident becomes a 3 a.m. drive to the datacenter. Protect critical processes gently (a modest negative oom_score_adj), never absolutely, and never on the thing that's actually leaking.

Then, by cause:

  • A leaking process: fix or cap the leak. If the process table shows one process growing without bound across kills, that's a memory leak, and no amount of RAM saves you — a leak fills any size of box, just slower. Find it (the fat row in the dump, or ps aux --sort=-rss | head on a live box), and either fix the bug or, as a stopgap, put a hard memory limit on it so it dies instead of taking the machine down with it. On systemd that's one line: MemoryMax=2G in the unit's [Service] section, then systemctl daemon-reload && systemctl restart. Now its OOM is contained to its own cgroup and the rest of the box never notices.
  • An oversized service config: right-size it. The single most common "leak" that isn't a leak is a database configured to use more memory than the box has. MySQL's innodb_buffer_pool_size, PostgreSQL's shared_buffers + work_mem × connections, a JVM's -Xmx — set these too high (or let work_mem multiply across hundreds of connections) and the service will dutifully grow until the kernel shoots it. The fix is arithmetic, not more RAM: add up the worst-case footprint of everything on the box and make it fit under total RAM with headroom. (A surprising amount of "we need a bigger server" is really "one config line is set to 1.5× the machine.")
  • A genuinely too-small box: add RAM, or add swap as a cushion. If the workload legitimately needs more memory than the box has, the honest fix is more memory. Short of that, swap buys you a buffer — it lets cold anonymous pages spill to disk so the OOM killer holds off longer (a box with zero swap OOMs the instant it's full, with no grace at all). Swap is slow and a thrashing box is its own problem (swap thrashing), but a few gigs of swap turns a hard OOM kill into a slow-down you can catch and react to. A box doing real work with no swap configured is living without a safety net.
  • A cgroup/container OOM: raise the limit or shrink the workload. If the log said Memory cgroup out of memory, the machine is fine — one service hit its own MemoryMax. Decide which is wrong: the limit (too tight for legitimate work — raise it) or the service (doing more than the limit allows — shrink its work, or scale out). Don't reach for global fixes; a global OOM and a cgroup OOM look similar in the log and have completely different cures. Confusing the two is the most common OOM mistake on container hosts.
  • Protect the process that must survive — gently. If a specific process must outlive an OOM event (your database, say, while letting a batch worker be the sacrifice), nudge the scores rather than overriding them. Lower the database's likelihood of selection with a modest negative adjust (OOMScoreAdjust=-500 in its systemd unit), and/or raise the expendable worker's so it volunteers first (OOMScoreAdjust=500). You're not making anything immortal — you're telling the kernel your preference order, which is exactly what oom_score_adj is for.

Pro Tip

oom_score_adj is your steering wheel, and you can read and set it live, no reboot. Every process exposes its current value at /proc/<pid>/oom_score_adj (the knob, -1000 to +1000) and its computed badness at /proc/<pid>/oom_score (what the kernel actually ranks on). Want to know who dies next if memory runs out right now? Rank every process by its live OOM badness — for p in /proc/[0-9]*; do printf '%s %s\n' "$(cat $p/oom_score 2>/dev/null)" "$(cat $p/comm 2>/dev/null)"; done | sort -rn | head — and you've printed the kernel's own kill list before the kill. The process at the top is the one with the target on its back.

How to Avoid It

You can't make a box immune to running out of memory — physics and arithmetic don't bend — but unlike a failing disk, OOM is almost entirely self-inflicted and therefore almost entirely preventable. In rough order of payoff:

  1. Size the workload to the box — on paper, before it bites. Add up the worst-case resident memory of everything that runs: each database's configured buffers, each app's heap ceiling, work_mem × peak connections, plus a generous slice for the page cache and kernel. If that sum is anywhere near total RAM, you're already living on the edge and an OOM is just waiting for a busy Tuesday. This single spreadsheet prevents most OOMs that ever happen — far more than any amount of tuning after the fact.
  2. Put a hard limit on anything that can run away. Background workers, queue consumers, anything user-triggered, anything that processes untrusted input — give each its own MemoryMax cgroup limit so a runaway is contained to its own death instead of taking the whole machine. A contained OOM is a restarted worker; an uncontained one is an outage. This is the single highest- leverage habit on the list: it turns "the server fell over" into "one job got killed and retried."
  3. Configure swap deliberately — some, not none, not infinite. Zero swap means zero grace: the box OOMs the instant RAM is full, with no warning slope. A sane amount of swap (a few GB, or follow your distro's guidance) gives cold pages somewhere to go and turns a cliff into a ramp you can see coming. But swap is not more memory — a box that's constantly in swap is thrashing, and the answer there is real RAM, not bigger swap. Tune swappiness so the kernel leans on swap sensibly rather than too eagerly.
  4. Protect the critical few, expendable the rest. Decide in advance who should die first. A small negative OOMScoreAdjust on the services that must survive, a small positive one on the batch jobs that can be retried — set once in the systemd units, and an OOM event resolves the way you'd have chosen instead of the way RSS happens to fall on the night.

Note

The deepest prevention isn't a setting at all — it's watching the trend of free memory, because OOM is the one server failure that announces itself for hours before it strikes. A leak is a straight line sloping toward zero; a sane workload is a flat line with breathing room. The slope tells you when you'll hit the wall, with days or hours of warning — but only if something samples free memory continuously and remembers yesterday's number. A human running free -h by hand always runs it the morning after the OOM, when the line already hit the floor.

How Linux Memory Actually Works

Now the part you don't need mid-incident but that ties the whole page together — and it's one of the more audacious tricks in computing. To understand why a server can run out of memory while free shows gigabytes available, why the kernel kills instead of just failing the allocation, and why oom_score_adj exists at all, you have to understand the bargain Linux strikes with every program the moment it starts: it promises memory it does not have.

Virtual Memory: Everyone Gets Their Own Universe

Every process on a Linux box runs inside its own private, contiguous virtual address space — a clean, enormous expanse of addresses that looks, to the program, like it has the whole machine to itself. It doesn't. Those virtual addresses are a fiction maintained by the kernel and the CPU's memory-management unit, which translate each virtual page the program touches into some real page of physical RAM (or a slot on disk) on the fly, transparently. Two processes can both "use" address 0x400000 and never collide, because each one's 0x400000 maps somewhere different in physical memory. This is virtual memory, and it's the foundation everything else rests on: it's why one process can't read another's memory, why a program can be larger than RAM, and why the machinery of OOM exists at all.

The crucial consequence is the gap between mapping and using. When a program asks for memory — malloc(1 GB), say — the kernel doesn't go find a gigabyte of RAM and hand it over. It just notes, in the process's page tables, that this range of virtual addresses is allowed. No physical RAM moves. The actual pages are handed out one at a time, lazily, only when the program first writes to each one — a page fault that the kernel quietly services by finding a real page right then. This is why total_vm in the OOM report is always so much larger than rss: a process can map 8 GB while only ever touching — and therefore only ever costing — 2 GB. Virtual memory is a promise; resident memory is the bill coming due, one page at a time.

Overcommit: The Bank That Lends More Than It Holds

Here's the audacious part. Because programs routinely ask for far more memory than they ever actually touch, Linux does something that sounds reckless and is, in practice, brilliant: it says yes to more memory than it has. This is overcommit, and it's the default behaviour. The kernel is a bank that knows perfectly well not every depositor will withdraw at once, so it lends out more than it holds in the vault, betting — correctly, almost always — that the promises won't all be called in simultaneously. A process mallocs 4 GB on a 2 GB box and succeeds, because the kernel is gambling the process will only ever touch a fraction.

That gamble is what makes Linux fast and dense — it's why you can run dozens of services that each reserve generous memory on a modest box. But it has a sharp edge: if the promises ever are all called in at once — every process actually writing to the memory it was promised — the bank runs out of vault. There's no more physical RAM to satisfy the page faults, no more swap to spill to, and the kernel is now holding promises it physically cannot keep. It can't un-promise (the memory's already in use), it can't fail the write (the program asked legitimately, long ago). The only move left is to default on the loan by force: reclaim a chunk of memory the only way that's instant and guaranteed — kill a process and take all of its pages back at once. That is the OOM killer. It is, quite literally, the bank's last-resort debt collector, and it exists because of overcommit. Now the whole page connects: overcommit is why the box could promise too much, and the OOM killer is what happens when the promises come due and the vault is empty.

You can tune how generous the bank is, via vm.overcommit_memory:

  • 0 (heuristic, the default) — the kernel uses a rough heuristic: obviously-insane allocations are refused, but reasonable overcommit is allowed. This is what almost every box runs, and it's why a 4 GB malloc on a 2 GB box succeeds.
  • 1 (always) — overcommit anything, never refuse. Used by workloads that allocate huge sparse arrays and genuinely never touch most of them (some scientific and database code). Lives dangerously by design.
  • 2 (never) — strict accounting: the kernel refuses any allocation that would push total commitments past swap + overcommit_ratio% of RAM (default overcommit_ratio is 50). Now malloc fails honestly instead of succeeding and risking a later OOM kill — your program gets ENOMEM it can handle, rather than a SIGKILL it can't. Sounds safer, and for some workloads it is — but set it carelessly and well-behaved programs start failing allocations while RAM sits half-empty, because the commit limit, not actual usage, is the ceiling. It's a real tool with real footguns, which is exactly why it's not the default.

Global vs cgroup: Two Walls, Two Killers

There's a second OOM that lives at a smaller scale, and it trips people up because the log line looks almost identical. A cgroup — the kernel's mechanism for fencing a group of processes inside a resource budget — can have its own memory ceiling, and modern systems use them everywhere: every systemd service can carry a MemoryMax, every container runs inside one. When a cgroup's processes collectively touch more memory than that limit allows, the kernel runs the OOM killer scoped to just that cgroup — picks the worst process inside the fence and kills it — even though the machine as a whole may have RAM to spare. The log says Memory cgroup out of memory instead of the plain global form, and that one word changes everything: a global OOM means the box is too small or overloaded; a cgroup OOM means one service's budget is too tight (or that service is misbehaving inside it). Same killer, same scoring, a smaller arena — and the reason MemoryMax is such a good safety tool is precisely that it turns a machine-wide catastrophe into a contained, single-service event the rest of the box never feels.

oom_score: How the Kernel Chooses Who Dies

When the killer does fire, it doesn't pick at random and it doesn't pick the trigger. It computes a badness score for every eligible process and kills the worst one — and the scoring is refreshingly simple at heart: badness is roughly how much memory you're using. A process using 60% of available memory scores around 600 (the scale runs 0–1000); one using 5% scores around 50. The logic is pure triage: killing the biggest user frees the most memory with the fewest casualties, so the kernel does exactly that. This is why the victim is so often your database or your JVM — not because they did anything wrong, but because they're honestly, legitimately the biggest thing in the room when the music stops.

Then comes the one knob you get to turn: oom_score_adj, a value from -1000 to +1000 added straight onto the badness score. Set a process to +1000 and it's effectively first in line to die no matter how little memory it uses (containers do this to background helpers). Set it to -1000 and the process becomes completely immune — the kernel will never choose it, even if it's the biggest user on the box, and will go kill the next-worst thing instead. This is exactly why a well-set-up system ships sshd at -1000: whatever catastrophe unfolds, you keep your way in to fix it. The oom_score_adj column in the process-table dump is the kernel showing you, mid-kill, everyone's hand-tuned priority — and that lone -1000 on sshd in our example is a distro quietly making sure the lights stay on in the server room even as the house burns. (The values between the extremes are where the real craft lives: a -500 makes your database unlikely to be chosen without making it the unkillable monster that gets your whole machine wedged — the gentle protection the Danger box above insisted on.)

So: a bank that over-lends, a debtor base that mostly never withdraws, and a debt collector for the rare day they all do at once — with a priority list you get to influence so the right account gets closed. Hold that picture and every line of the OOM report reads itself: the trigger (whoever made the withdrawal that broke the bank), the memory dump (the empty vault), the process table (everyone's balance), and the verdict (the account the collector closed). It's not the kernel being cruel. It's the kernel keeping a promise it was always too optimistic to make — the only way it still can.

See Also

  • free — the first glance at how much memory is really left
  • vmstat — watch si/so and free memory in the run-up to a kill
  • top — live memory per process; spot the grower before the kernel does
  • psps aux --sort=-rss | head names the biggest memory users on demand
  • dmesg — where the kernel writes the full OOM report
  • journalctljournalctl -k for the same report, with timestamps and persistence
  • memory full — the slope that leads here: RAM running out before the kill
  • swap full — when the last cushion is gone and OOM comes fast
  • swap thrashing — the grinding crawl that usually precedes an OOM kill
  • memory leak — the most common cause: a process that grows without bound
  • swappiness too high — when the kernel pages to swap too eagerly
  • memory errors — a different memory problem: bad RAM, not too little
  • swap — the disk-backed cushion that buys OOM some grace
  • cgroup — how one service hits its own memory limit without touching the machine's

A process on your box just vanished — was it a crash, or did the kernel shoot it for memory?

CleverUptime watches the OOM kill counter and free-memory trend on every server you run it on, and tells you definitively when the kernel ran out and reaped a process — naming the box, reporting how much RAM was actually left and which process is the biggest consumer, so you fix the leak or right-size the service before the next kill instead of finding the body in the morning.

Want to see your own server's health right now? One command, no signup, no install.

Check your server →