NVMe Spare Blocks Exhausted: Symptoms, Diagnosis & Fixes
Every SSD keeps a hidden reserve to hide its own dying cells. When that runs dry, the drive is genuinely out of road.
What It Is
An NVMe drive — like every SSD — is built around a quiet lie it tells the operating system: that all of its flash works. It doesn't, and it never did. Flash cells fail constantly, from the day the drive is made, and the controller spends its whole life hiding that from you by shuffling your data off the bad cells and onto good ones held in reserve. That reserve is the spare pool — a private stash of extra flash blocks the drive never advertises, set aside for exactly this. As long as the pool has blocks to give, a cell going bad is a non-event: the drive remaps around it and you never notice.
"NVMe spare blocks exhausted" is the moment that reserve runs dry. The drive reports its remaining spare as a percentage, and it carries a manufacturer-set floor — the Available Spare Threshold, almost always 10%. While the spare reading sits up at 100%, all is well. When it falls to or below that floor, the drive sets a bit in its SMART Critical Warning byte and says, in effect: I have run out of healthy blocks to relocate to; I can no longer guarantee your writes. Shortly after that, most drives flip themselves read-only to protect what's still on them — your service is up, your data is readable, and nothing new can be written. That last-ditch read-only flip is the SSD's version of the read-only remount a failing disk triggers on spinning rust, and it tends to arrive at an equally unkind hour.
Here is the single most important thing to get straight before anything else on this page, because it's the distinction the whole topic turns on. Spare exhaustion is not the same as wear-out. A worn-out SSD — one that's burned through its rated write budget — is old, not broken; it keeps working and you replace it on a relaxed schedule (that's SSD worn out, a different page and a different mood entirely). Spare exhaustion is a real defect: the drive has accumulated so many genuinely-dead cells that its safety reserve is gone, and it is telling you, in the strongest language SMART has, to replace it now. One is a smoke alarm chirping for a fresh battery. This is the smoke. We'll spend the rest of the page making sure you can tell them apart at a glance, read the exact numbers that prove it, and know precisely what to do — and then, at the end, the quietly clever reason a healthy SSD needs a spare pool at all, which is one of the better stories in computing.
How You Notice
Spare exhaustion announces itself in a handful of places. Here's each one, with the command to check it on your own box right now — so you can tell the real thing from a scare:
-
The drive goes read-only and writes start failing. This is the loudest symptom and usually the first one a person notices: the filesystem is suddenly read-only even though
dfswears there's free space. Every write fails, services that need to log or write a PID file fall over, and databases throw write errors. Check it:mount | grep -w ro touch /var/lib/testfile # fails with "Read-only file system"Plenty of free space but nothing will write is the giveaway that this is a hardware limit, not a disk full capacity limit — same panic, completely different cause.
-
I/O errors in the kernel log. When the drive can no longer satisfy a write, the kernel says so in plain text. Look:
dmesg -T | grep -iE "nvme|I/O error|read-only|EIO" journalctl -k -p errLines naming
nvme0n1alongsideI/O erroror a remount-read-only message are the rawest evidence there is — and an empty result here is genuinely good news. -
The SMART critical-warning bit is set. This is the definitive one, and it needs no guesswork. The drive itself raises a flag the instant spare crosses the floor:
smartctl -a /dev/nvme0 nvme smart-log /dev/nvme0A
Critical Warningof anything but0x00means the drive is flagging something specific, and a lowAvailable Sparenext to it names which something. (Bothsmartctland thenvmeCLI read the same log page off the drive — use whichever's installed.) If this command comes back withUnable to detect device typeor an empty health log instead of numbers, the problem is upstream of the spare pool — the drive isn't answering at all — and that's its own diagnosis: SMART unavailable. -
A
smartdemail — if you set it up. Thesmartmontoolsdaemon can watch theCritical Warningbyte and mail root the moment it changes. Most servers never turn it on, which is exactly why the read-only flip is the first many admins ever hear of the problem. We'll fix that at the end.
Any one of these means: stop guessing and go read the drive's health log. The tool for it is one command, and the answer is unambiguous.
How I Read It
Every NVMe drive keeps a compact health log — the SMART / Health Information log page — and the command I reach for first asks for the whole thing:
smartctl -a /dev/nvme0
Here's the part that matters, lifted straight from a healthy NVMe drive in one of our racks (app-01) so you have a calm baseline to measure against:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 38 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 7%
Data Units Read: 203,913,228 [104 TB]
Data Units Written: 64,663,716 [33.1 TB]
Media and Data Integrity Errors: 0
Read it top to bottom and it's a short, honest story. Critical Warning: 0x00 — the drive is flagging nothing. Available Spare: 100% sitting comfortably above Available Spare Threshold: 10% — the reserve is full and miles from its floor. Percentage Used: 7% — this drive has spent 7% of its rated write budget and has years of life left. Media and Data Integrity Errors: 0 — it has never once handed back data it couldn't vouch for. This is what fine looks like, and it's what the overwhelming majority of your drives look like.
Note
Don't confuse the two percentages — they are the heart of this whole page.
Available Spareis a health gauge: how much of the safety reserve survives, counting down from100%toward the10%floor as cells die for real.Percentage Usedis an age gauge: how much of the rated write budget is spent, counting up past100%as the drive does its job. A drive can be atPercentage Used: 4%(practically new) and still have exhausted its spare (genuinely defective) — wear and defect are different axes, and a drive can fail on either.
Now the failing case. Here's the same log off a drive whose spare reserve has collapsed — the values you'd see when the floor has been breached:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
- available spare has fallen below threshold
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x01
Temperature: 41 Celsius
Available Spare: 4%
Available Spare Threshold: 10%
Percentage Used: 61%
Data Units Read: 412,887,001 [211 TB]
Data Units Written: 388,201,556 [198 TB]
Media and Data Integrity Errors: 119
Walk it the same way and it reads like a confession. Critical Warning: 0x01 — not zero, and that lowest bit has a precise meaning we'll decode in a second. Available Spare: 4% against an Available Spare Threshold: 10% — the reserve has fallen below its floor; there is almost nothing left to remap dying cells onto. Percentage Used: 61% — and this is the gut-punch: the drive is only 61% through its rated write life. It hasn't worn out. It's broken — failing well ahead of schedule, with a third of its write budget still unspent. And Media and Data Integrity Errors: 119 — it has already handed the operating system data it could not vouch for, 119 times. That overall-health: FAILED! line at the top, with its tidy one-line reason available spare has fallen below threshold, is — for once — telling the exact truth, because it's tripping on a genuine defect rather than the wear bar. (On a failing disk we teach you to distrust that headline precisely because it cries wolf on mere wear; here it's crying about a real wolf.)
The Rest of the Log, Field by Field
Those two examples lean on four lines, but the health log has a few more worth knowing — because the day you paste yours into a ticket, you want to read every line, not just the famous ones. Top to bottom:
Temperature. The drive's own thermal sensor, in Celsius. NVMe modules run hot —38idle and60+ under sustained load is ordinary; sustained past70and the drive starts thermal-throttling (deliberately slowing itself to cool down), and past its limit it trips bit 1 of the warning byte. A drive that's both hot and shedding spare is a drive being cooked to death — fix the airflow as well as planning the swap (that thermal angle is its own page, overheating).Available Spare/Available Spare Threshold. The pair this whole page is about — the health gauge and its floor.Percentage Used. The age gauge — covered in the Note above and dissected in full further down.Data Units Read/Data Units Written. The drive's lifetime odometer, counted in 512,000-byte units (NVMe's quirky unit — 1,000 logical blocks of 512 bytes), which is whysmartctlhelpfully annotates the human total in brackets:64,663,716 [33.1 TB]. Written is the one that matters for endurance — it's the raw input behindPercentage Used. Divide the written terabytes by the drive's rated TBW and you get, near enough, the same fraction the drive reports as "used." (And here's a quiet tell worth a glance: ifData Units WrittendwarfsData Units Readon a drive that's supposedly serving mostly reads, something is writing far more than you think — a chatty log, a database withfsyncon every transaction, anatimemount churning metadata. The odometer doesn't lie about where the mileage went.)Media and Data Integrity Errors. The single most damning line after the spare gauge. This is the count of times the drive served up — or caught itself about to serve — data that failed its own internal ECC check: bits that rotted past what the error-correcting code could rebuild. On a healthy drive this is0, forever. Any non-zero value means real, uncorrectable corruption has already happened at least once, and a number that climbs between two readings is a drive actively losing data. It's the NVMe cousin of an HDD's reported-uncorrectable count, and it deserves the same dread.Error Information Log Entries/Unsafe Shutdowns/Power Cycles/Power On Hours. The drive's logbook footnotes — useful context, rarely the headline. A pile ofUnsafe Shutdowns(power yanked mid-write) hints at a flaky PSU or no UPS, and correlates loosely with corruption;Power On Hoursjust tells you the drive's age in service. Glance at them; don't lose sleep over them.
When you're triaging fast, you don't need the full -a dump — the two numbers that decide everything fit in one filter: smartctl -a /dev/nvme0 | grep -iE 'critical|spare|integrity' pulls the warning byte, the spare gauge against its floor, and the integrity-error count — the entire verdict of this page in a single screenful.
Decoding the Critical Warning Byte
That Critical Warning: 0x01 isn't a number you read as "one." It's a bitmask — one byte where each of the low six bits is an independent alarm the drive can raise. Knowing the bits turns the hex into a sentence. Here is the full decode, straight from the NVMe spec:
| Bit | Hex | Meaning |
|---|---|---|
| 0 | 0x01 |
Spare capacity below threshold |
| 1 | 0x02 |
Temperature above/below threshold |
| 2 | 0x04 |
NVM subsystem reliability degraded (media errors) |
| 3 | 0x08 |
Media placed in read-only mode |
| 4 | 0x10 |
Volatile memory backup device failed |
| 5 | 0x20 |
Persistent Memory Region read-only / unreliable |
This page is bit 0 — 0x01, spare capacity below threshold. The bullets below walk the ones you'll meet most:
- Bit 0 —
0x01— Available spare below threshold. The one we're chasing. Exactly the defect on this page. - Bit 1 —
0x02— Temperature out of range. The drive is over (or under) its safe operating band — see overheating. - Bit 2 —
0x04— Reliability degraded. The controller has logged enough media or internal errors that it no longer trusts itself — a sibling defect that also means replace. Worth one caution: unlike a SATA drive, where a flood of errors can be the cable rather than the disk (the classic misread covered in disk cable errors), an NVMe drive talks to the CPU directly over PCIe with no data cable to blame — so when an NVMe sets this bit, it really is the drive. - Bit 3 —
0x08— Media is read-only. The drive has given up on writes entirely and locked itself read-only to protect your data. Often the next state after bit 0, once the spare floor is breached and writes can no longer be honoured safely. - Bit 4 —
0x10— Volatile memory backup failed. The capacitor or battery that lets the drive flush its cache on sudden power loss has died (only meaningful on drives that have one).
The bits combine. 0x01 is spare alone; 0x05 (0x01 | 0x04) is spare and reliability — a drive failing on two fronts at once, which is common, because the same flood of dead cells that drains the spare pool is exactly what trips the reliability bit. So you don't memorise hex values; you read the byte as a set of flags and translate. Any non-zero Critical Warning on an NVMe drive is the drive raising its hand — and bit 0 set is it specifically pointing at an empty spare cupboard.
Pro Tip
Convert the byte in your head with the low bits:
0x01= spare,0x02= temperature,0x04= reliability,0x08= read-only,0x10= backup-power. Add the ones you see.0x06means temperature and reliability;0x09means spare and read-only — a drive that's run out of reserve and already locked itself down. The hex looks cryptic until you realise it's just a checklist with the boxes ticked.
Reading It by Example
Train the pattern-match. Readout on the left, what I'd actually conclude on the right:
Critical Warning: 0x00,Available Spare: 100%,Percentage Used: 7%→ Healthy. Nothing to do. The happy, and by far most common, case — most of your drives look exactly like this.Critical Warning: 0x00,Available Spare: 100%,Percentage Used: 96%→ Worn out, not broken. The spare is untouched; the drive has simply spent its write budget. No emergency — that's SSD worn out, a calm scheduled replacement, not this page.Critical Warning: 0x01,Available Spare: 4%, threshold10%→ Spare exhausted. The defect on this page. The reserve is below its floor; the drive is about to go read-only if it hasn't already. Back up and replace now.Available Spare: 9%, threshold10%,Critical Warning: 0x01,Percentage Used: 22%→ The same defect caught early — barely under the floor, still young in write-life. Don't be reassured by the lowPercentage Used; the drive is defective regardless of its age. Replace it.Critical Warning: 0x05,Available Spare: 2%,Media and Data Integrity Errors: 540→ Spare and reliability bits both set, integrity errors piling up. A drive coming apart on every axis at once. Stop writing to it, rescue data, replace immediately.Critical Warning: 0x08, every write failing → The drive has already flipped itself read-only (bit 3). You're past the warning and into the consequence — your data is still readable, so get it off and swap the drive.Available Spare: 88%, threshold10%,Critical Warning: 0x00→ A few cells have died (the spare's dipped from 100), but the reserve is enormous and nowhere near its floor. Perfectly normal aging — every SSD does this. Nothing to do; we keep reading the number so you'll know the day it actually matters.
How to Fix It
The diagnosis is rarely ambiguous here — a low Available Spare against its threshold, with bit 0 set, is the drive telling you plainly it's defective. But the very first step is always the same, and it isn't optional.
Danger
A drive at or below its spare threshold can flip to read-only at any moment, and once it does you can't get new data off cleanly. Get your irreplaceable data off now, while writes-to-elsewhere still work, with something gentle like
rsync -ato another machine. Do not run a heavy write benchmark, abadblocks -w, or a fullfsckthat tries to write — every write now is asking a drive with no spares left to find a healthy block it may not have. Reads are safe and gentle; writes are the danger. No fix is more urgent than your backup.
Then:
- Spare below threshold (bit 0 set,
Available Spare≤Available Spare Threshold): replace the drive. This is the whole answer. There is no command that refills the spare pool — those blocks are physically dead flash, not a setting. The reserve only ever shrinks. If it's in a RAID array, fail and remove the bad member (mdadm --fail /dev/md0 /dev/nvme0n1p1, then--remove), slot in a fresh drive, and let the array rebuild onto it — which is the entire reason the array exists. (Often the array will have done the first step for you: a drive throwing write errors gets kicked out, leaving you with a degraded array running on its survivors, or a degraded RAID array if it dropped out hard — either way, the defective spare-exhausted NVMe is the one to pull.) On a rented or hosted box, open a ticket with thesmartctl -aoutput pasted in. This is a real defect — exactly the bar a hoster like Hetzner uses for "we'll swap it" — so a decent provider replaces it, usually free and often same-day. (That's the crucial difference from a merely worn-out SSD, which still works and so won't meet their defect bar; spare exhaustion clears it cleanly.) - Already read-only (bit 3,
0x08): same answer, more urgency. The drive has made the decision for you. It will keep serving reads, so copy everything off and replace it. Don't waste time trying to remount it read-write — the drive is refusing writes on purpose, to protect what's left. - Spare and reliability both set (
0x05and friends), integrity errors climbing: stop using the drive for anything you can avoid, rescue what you can, and replace it immediately. A drive failing on multiple bits is not going to get better, and every read you do is a small risk.
There is, deliberately, no "monitor it and see" option on this list. Spare exhaustion is one of the few SMART states where the right move is unambiguous and the same every time: replace. The reserve doesn't come back.
How to Avoid It
You can't prevent flash cells from dying — that's physics, and the end of this page explains it. But you can keep a drive's spare pool from draining decades early, and the levers are the same ones that govern SSD life in general:
- Backup, first and always. A defective drive is a chore instead of a catastrophe only if its data lives somewhere else too. A backup you've actually tested a restore from — not merely configured — is the whole difference. Our backup guide covers the how, and on a drive that may go read-only any minute, it's the first thing to confirm, not the last.
- Buy the right endurance for the workload. SSD endurance is sold as TBW (terabytes written) or DWPD (whole-drive writes per day across the warranty). A write-heavy database on a cheap consumer drive will chew through cells — and then the spare pool — far faster than the spec sheet's happy-path assumes. Match the drive's rating to what you'll actually throw at it.
- Tame write amplification. The biggest, most surprising amplifier in most stacks is a database told to be maximally safe — forcing every transaction individually to disk turns a trickle of logical writes into a torrent of physical ones, and physical writes are what kill cells. Sane batching, a filesystem mounted with
noatime, and not pointing high-churn temp files at your data SSD all spend the spare pool slower. - Keep it cool. Heat makes flash cells leak charge and fail faster, which feeds bad blocks into the spare pool ahead of schedule. NVMe drives run hot under load; a heatsink and airflow are cheap next to a drive. (Run too hot and you'll trip the temperature bit,
0x02, long before the spare one — that's overheating.) - Mirror what matters. RAID doesn't stop a drive from going defective, but it turns the event into a calm hot-swap instead of an outage — any one member can exhaust its spare and die without taking the service with it, leaving you with a degraded array you rebuild on your own schedule rather than a 3 a.m. restore.
But notice the ordering, and why it's deliberate: backup is rule 1 and mirroring is rule 5, never the other way round. A mirror copes with one drive dying; it does nothing against a fat-fingered rm -rf, a bad deploy, or ransomware — every one of which it dutifully replicates to both members in the same instant. And there's a sharper edge specific to flash: identical NVMe drives, bought in one order and fed the exact same write stream as RAID mirrors do, tend to wear their spare pools down in lockstep — so the day one breaches its floor, its twin is often days behind, and "both mirror members defective in the same week" is far less of a fluke than the odds naively suggest. The mirror buys you uptime; only the backup buys you your data back.
And the deepest version of all this isn't a one-off command — it's watching the trajectory. Available Spare ticking from 100 to 98 over a year is normal aging; ticking from 60 to 30 in a week is a drive about to breach its floor, and you only catch that if something reads the health log every day and compares. A single manual smartctl run, months apart, misses exactly the slope that matters.
How the Spare Pool Actually Works
Now the part you don't need in an emergency — but that makes every number above stop being trivia. Once you can picture what the spare pool is and why it exists, every number above stops being trivia to memorise and becomes something you can simply reason out. And it's a neat bit of engineering.
Why an SSD Needs a Reserve at All
An SSD or NVMe drive has no moving parts. It stores each bit as a tiny trapped electric charge in a microscopic flash cell — picture a bucket that holds a few electrons, reading "on" if it's full and "off" if it's empty. Billions of buckets, and your data is just which ones are full. Reading a bucket is gentle; you can do it forever. But writing one means forcing charge across a thin insulating wall, and erasing it means dragging that charge back out — and every single time, that wall wears down a little. It is not a metaphor for wear; it is literally wearing through an insulator a few atoms thick. Eventually the wall gets too leaky to reliably tell full from empty, and that cell is done.
Here's the piece of real magic, the thing that makes SSDs usable at all: individual cells fail constantly — from day one — and the drive is built entirely around hiding that from you. Every SSD ships with more flash inside than it sells you. A "512 GB" drive might hold 540 GB of physical flash; that extra slice is the over-provisioned spare pool, and it never appears in df. When a cell goes bad, the controller copies your data onto a fresh block from the reserve and remaps the address, so your filesystem never sees the wound. An SSD is, under the hood, a tiny civilization constantly moving residents out of crumbling buildings into fresh ones, demolishing the old blocks, and never once mentioning it to the city above. "No moving parts" is true — but it's furiously busy in there.
Available Spare, then, is simply the gauge on that reserve: the percentage of the spare pool still healthy and available to remap to. It starts at 100% and falls as the drive consumes spares to retire dead blocks. The manufacturer sets the floor — the Available Spare Threshold, almost always 10% — as the point past which it can no longer promise to absorb the next failure. Cross it, and the safety net is effectively gone: the next dead cell may have nowhere healthy to land. That's why the drive shouts at exactly that line, and why "spare exhausted" means defective, not old.
Wear-Out vs. Defect — the Distinction That Runs This Page
This is where the two numbers finally click together, and it's the single most useful thing to carry away.
Percentage Used is the age gauge — the odometer. It estimates how much of the drive's rated write endurance has been spent, climbing from 0% toward (and past) 100% as you write. A drive at Percentage Used: 100% has simply driven its warranted mileage; it typically keeps working well beyond it. That's wear-out — graceful, predictable, and covered on its own page, SSD worn out.
Available Spare is the health gauge — the warning light. It tracks something different: not how much you've written, but how many cells have actually died and been remapped. A drive can be young on the odometer and still have its warning light blazing, because it shipped with a weak batch of flash, ran too hot, or hit a manufacturing flaw — any of which kills real cells faster than writes alone would. That's the case in the failing log above: Percentage Used: 61% (plenty of odometer left) but Available Spare: 4% (warning light on). The drive isn't tired. It's defective.
Hold both gauges side by side and the whole topic resolves: wear-out is the odometer reaching the warranty; spare exhaustion is the warning light coming on early. One is a scheduled replacement; the other is a tow truck. The product of this page, and the smartctl command behind it, exists to keep you from confusing the two — so you replace the drive that's actually broken and let the merely-old one keep earning its keep.
What Eats the Pool Faster
The reserve is finite, but a few things you control drain it ahead of schedule:
Heat. Flash cells leak charge faster when hot, and a leaky cell is a dying cell. A drive baking at 70 °C under sustained load is feeding bad blocks into its spare pool far quicker than the same drive at 40 °C. NVMe runs hot — a heatsink and airflow are cheap insurance.
Write amplification — the database trap. Reads are nearly free; it's writes that retire cells, and the biggest hidden multiplier is a database configured to be maximally safe. Force every transaction individually down to flash — no batching, no buffering — and a modest stream of logical writes becomes a flood of physical ones. We once watched a single over-cautious durability setting drive an SSD's Percentage Used from 0 toward 80 in a matter of weeks; the spare pool drained right alongside it. Relax the setting, let writes batch sanely, and the counters go back to crawling. A database's durability knob is wired straight to your drive's life expectancy.
Cheap, dense flash. A cell can store more than one bit if the controller measures its charge level precisely enough — and the denser it's packed, the fewer writes it survives before the levels blur and it dies. SLC (one bit) lasts tens of thousands of writes; QLC (four bits, sixteen charge levels in one leaky bucket) is rated for a low few hundred. A write-heavy job on a budget QLC drive isn't buying storage so much as a countdown — and the spare pool empties at the speed of that countdown. (There's even more to the flash story — the cells stacked in towers hundreds of layers deep, the SLC-cache sleight of hand that keeps cheap drives feeling fast — and the failing disk page tells it in full.)
So the spare pool is the SSD's whole survival strategy in one number: a hidden reserve, quietly spent to hide a constant drizzle of cell death, with a marked floor where the drive admits the drizzle has become a flood. When Available Spare breaches that floor, the reserve isn't low — it's gone, and so is the drive's ability to protect your next write. The honest diary told you the whole time; the one thing that actually saves you is the backup you made before you needed it. The sysadmin's blunt creed fits flash as well as it fits rust: no backup, no pity.
See Also
- SSD worn out — the other SSD end-of-life, the calm one: endurance spent, no defect
- disk failing — the full SMART-reading playbook, HDD and SSD, and the flash physics in depth
- backup — the one habit that makes every dead drive a non-event; the most important link here
smartctl— the tool that reads the drive's health lognvme— the NVMe-native CLI,nvme smart-logfor the same numberssmartmontools— the package, plus thesmartddaemon that watches it for youdmesg— where the kernel's read-only flip and I/O errors surface first- NVMe — how the protocol and the drive actually work
- SSD — how flash stores a bit, and why cells die
- RAID — surviving a defective drive without losing sleep or data
- RAID degraded — when the defective drive was a mirror or parity member and the array is now running thin
- degraded RAID array — when the failing NVMe dropped out of its array entirely
- disk cable errors — the misread that bins a healthy drive: errors on the link, not the flash
- SMART unavailable — when the health log won't read at all, so you can't see the spare number
- disk full — the other "writes are failing" emergency, about space rather than health
- overheating — the temperature bit in the same Critical Warning byte
Spare at 4%, drive about to go read-only — is that the disk you should be replacing tonight?
CleverUptime reads the NVMe Available Spare and Critical Warning on every drive you run it on, names the device the instant its spare crosses the manufacturer's floor, and tells you in plain language that this is a real defect to replace now — not the slow wear-out you can safely ignore.
Want to see your own server's health right now? One command, no signup, no install.