RAID Array Offline: Symptoms, Diagnosis & Fixes

The parachute didn't open. The array isn't running on borrowed time — it's already on the ground.

What It Is

An offline RAID array is one that can no longer assemble itself, so it serves nothing. The filesystem won't mount, every database and app that lived on it is down, and the data sitting on the surviving disks is right there — intact, on the platters — and completely unreachable, because the array that stitched it together refuses to start. People search for this under a handful of names: an inactive array, an array that won't assemble, the array offline, mdadm inactive. They're all the same wall.

It helps to put this next to its quieter cousin, because the two get confused constantly and the difference is the whole story. A degraded array has lost a disk but is still serving data — it kept its promise, dropped the dead member, and ran on what was left. That's the survivable one, the one you fix at your leisure (don't, but you could). This page is the other side of that coin: the array that has lost one disk too many, so there isn't enough left to assemble at all. Degraded is the warning light. Offline is the crash.

Degraded Offline / inactive
Serving data? Yes, normally No — nothing mounts
/proc/mdstat active raidN ... [n/m] [U_] inactive ... sdb1[1](S)
Redundancy Reduced or zero Past tolerance — array won't run
The job Replace the disk, rebuild Re-assemble it, or restore from backup
Urgency Urgent (next failure is fatal) The fatal failure already happened

So before the diagnosis, the one piece of good news worth holding onto: inactive does not always mean lost. There are two completely different reasons an array goes offline, and they have wildly different endings. In the hopeful case the disks are fine — every member's data is intact — and the array simply failed to start itself after a reboot, a cable bump, or a controller hiccup; one mdadm command brings it straight back. In the hard case you genuinely lost more disks than the array could survive, and on a parity array that means the data is mathematically gone — no command recovers it, and the honest answer is your backup. The entire skill of this page is reading the disks well enough to know, with certainty, which of those two you're in before you touch anything. Let's get you there.

How You Notice

Offline is loud in a way degraded never is — the moment something tries to use the storage, it falls over. The signals, each with the command to confirm it on your own box:

  • A mount fails outright. The most common first contact. A mount at boot or by hand returns a flat refusal:

    mount /dev/md0 /data
    # mount: /data: special device /dev/md0 does not exist.
    #   — or —
    # mount: /data: can't read superblock on /dev/md0.
    

    The block device is missing or empty because the array behind it never came up. Anything in /etc/fstab pointing at that array can wedge the whole boot.

  • The apps and databases on it are simply down. No slow degradation, no warning — the service that stores its data on the array won't start, or starts and immediately can't find its files. A web app throwing "data directory not found," a database refusing to open its tablespace: trace it back and you land here.

  • /proc/mdstat says inactive. The kernel's own live scoreboard, and the one file that names the problem out loud:

    cat /proc/mdstat
    

    A working array reads active raid5 ... [UUUU]. An offline one reads inactive with its members parked as spares — no personality, no [n/m] count, no [U_] map. The next section takes that readout apart token by token; it's the heart of the page.

  • The kernel logged the failed assembly. When mdadm tries to start an array and can't gather a quorum, it says so:

    dmesg -T | grep -iE "md/raid|md: |cannot start|not enough|kicking"
    journalctl -b | grep -iE "mdadm|md/raid|incrementally"
    

    Lines like md/raid:md0: not enough operational devices (2/4 failed) or md: md0 stopped are the kernel telling you, in plain text, exactly why it gave up. This log is where "the array won't come up" becomes "here's how many disks it's missing."

Any one of these means the same thing: the array is not running, and until you assemble it nothing on it is reachable. Resist the reflex to reboot repeatedly hoping it sorts itself out — if the array couldn't assemble once, it won't assemble on a second try either, and each reboot can renumber your disks while you're trying to read them.

How I Read It

Two readouts decide everything: /proc/mdstat tells you the array is inactive, and mdadm --examine on each member tells you whether the disks are recoverable. Read them in that order, because the first is instant and the second is where the verdict actually lives.

Start with mdstat. For contrast, here's what healthy looks like — a real four-disk RAID 5, all members present, an active personality, the full map:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[0] sdc1[1] sdd1[2] sde1[3]
      5860147200 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

You can read its pulse in three numbers: active raid5 (it's running, and it's parity), [4/4] (four expected, four present), [UUUU] (all four up). Now the inactive version of the same array — the shape of an offline array, straight from a box where assembly failed:

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : inactive sdb1[0](S) sdc1[1](S)
      3906921472 blocks super 1.2

Every token here is a symptom. Walk it:

  • inactive — the word that defines this page. The array exists as a device node but is not assembled and not serving. There is no active raidN here, because the kernel never started a personality — it gathered the disks it could find and then stopped, short of a working array. (Contrast the healthy line's active raid5. That gap is the problem.)
  • sdb1[0](S) sdc1[1](S) — the members the kernel did find, each with its raid-slot number, each tagged (S). That (S) is the tell. It looks like "spare," and that's exactly what's misleading: these are not real spares. When an array can't assemble, mdadm can't yet assign its members to live roles, so it parks every one it found as a nominal spare. Two (S) members and no active array means: the kernel has these disks in hand but doesn't have enough of them — or enough agreement between them — to build anything.
  • No [n/m] count, no [U_] map. A degraded array still shows [4/3] [UUU_] because it is a running array with a known hole. An inactive array shows neither, because there's no assembled array to have a map of. The absence is the diagnosis: the bookkeeping that describes a live array simply doesn't exist yet.
  • super 1.2 with no level/chunk/algorithm line — another absence. The healthy line spelled out level 5, 512k chunk, algorithm 2 because those describe an assembled array. Here you get the superblock version and nothing more, because the geometry only materializes once the array starts.

So mdstat has told you the array is offline and shown you which members the kernel could see. What it can't tell you is the thing that decides your whole afternoon: are those disks current, or stale? For that you go to the superblock on each one with mdadm --examine (-E for short). This is the diagnostic skill the whole page is built around, so let's do it carefully.

mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
     Raid Level : raid5
   Raid Devices : 4

  Device Role : Active device 0
  Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

          State : clean
    Update Time : Fri Jun  5 04:18:11 2026
         Events : 184237

The two lines that matter are Events and Array State — and the move is to run --examine on every member and lay their numbers side by side. Here's the whole point, the thing it takes people years to learn: the Events count is a heartbeat. Every time the array updates its state — every write barrier, every member change — mdadm increments a counter and stamps it into every healthy member's superblock at once. So in a working array, all members carry the same Events count. The instant a disk drops out, its counter freezes while the survivors keep ticking up. The member with the highest Events count is the most current; a member that dropped early has a lower count and is stale. That single comparison is what separates the two endings of this page.

Run it across all four and you might see something like this:

Member Device Role Events Update Time Reads as
/dev/sdb1 Active device 0 184237 Jun 5 04:18 Current
/dev/sdc1 Active device 1 184237 Jun 5 04:18 Current
/dev/sdd1 Active device 2 184201 Jun 5 04:11 7 minutes stale — dropped early
/dev/sde1 Active device 3 dead — superblock unreadable

Now you can read your situation, not guess it. Slots 0 and 1 are bang up to date. Slot 2 dropped a few minutes before the end — Events is only 36 behind, a tiny gap, which means its data is almost current. Slot 3 is genuinely gone: --examine errored or returned garbage, the disk is dead. Read the Array State field too — AA.. here says this member last saw slots 0 and 1 active and slots 2 and 3 missing, which corroborates the Events story.

That table is the fork in the road:

  • Events counts close, disks readable (the 36-behind case above) — the disks are fundamentally fine; the array fell apart over a brief window and one member is only slightly behind. This is recoverable with --assemble --force. Go to fix path 1.
  • Two-plus members truly dead or wildly behind — a RAID 5 missing two real disks has lost data the parity can no longer reconstruct. Go to fix path 2, and steel yourself.

Note

The Events gap is the difference between "force it and lose seconds of writes" and "the array is gone." A 36-event gap on a member is a rounding error — assemble it and the slightly-stale disk gets caught up during the rebuild. A gap of thousands means that member missed a huge amount of activity; forcing it in would splice ancient blocks into a live array and silently corrupt your filesystem. Always read the numbers before you reach for --force.

How to Fix It

The path forks hard here, and which fork you're on was decided by the Events comparison above — not by hope. Do that reading first. Both forks share one non-negotiable opening step.

Danger

Image the disks before you run a single recovery command. While the array is offline your data is unreachable but usually intact on the platters — and the wrong assemble or, worse, a --create, can overwrite that for good. If the data matters and the disks are at all readable, take a block-level copy of each member first with dd or ddrescue to spare disks or image files, and do your recovery experiments against copies. Every command below is far safer when a failed attempt costs you a re-copy instead of your data. And if the disks are genuinely dead, no command on this page brings them back — that's what the backup is for.

Path 1 — The disks are fine, the array just didn't assemble

This is the common, hopeful case: a reboot, a cable event, or a controller reset left the members un-started, their superblocks intact, Events counts close. The array is half-gathered into that inactive state and you simply need to finish the job. Two steps.

Stop the half-assembled array first. The kernel is holding those (S) members in a dead-end inactive device; you can't re-assemble on top of that, so tear it down (this is safe — nothing is mounted, nothing is serving):

mdadm --stop /dev/md0

Then force the assembly. Hand mdadm the members and tell it to accept a slightly-stale one:

mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
# or let it find everything from the superblocks:
mdadm --assemble --scan --force

Here's what --force actually does, because using it blind is how people destroy arrays. Normally mdadm refuses to assemble when members disagree on their Events count — it can't tell which view of the data is correct, so it safely declines. --force overrides that refusal: it picks the members with the highest Events count as the truth, rewrites the slightly-behind member's superblock to match, and starts the array degraded — running, but down the stale member, so you can then re-add it and rebuild cleanly. That's exactly why the Events reading matters so much: when the gap is small (our 36), --force is safe — you lose at most a few seconds of in-flight writes that the filesystem journal will sort out. When the gap is large, --force is dangerous — it would graft a far-stale member's blocks into a live array and corrupt it. Close counts, force away. Far counts, do not.

If it works, /proc/mdstat flips from inactive to active raid5 ... [4/3] [UUU_] — you've gone from offline all the way back to merely degraded, which is a state you now know how to finish: mount it, back up immediately while you can, then re-add the dropped member and let it rebuild.

Path 2 — Genuine multi-disk loss

Now the hard truth, told straight because you deserve it told straight. If two real disks are dead in a RAID 5 — not dropped-and-stale, but gone, superblocks unreadable, --examine erroring — then the array's data is mathematically unrecoverable, and there is no command that changes that. RAID 5 keeps exactly one disk's worth of parity: lose one member and parity rebuilds it; lose a second and the equation has more unknowns than it has numbers to solve them. The information needed to reconstruct those blocks was never stored anywhere — that was the whole trade RAID 5 made to be cheaper than mirroring. (The full XOR arithmetic is on the degraded page; here it's enough to know the parity ran out.)

So the answer is the one nobody wants and everybody needs: restore from backup. Build a fresh array, restore the data, and move on. We know that's not what you came here hoping to read — but pretending a magic recovery command exists would cost you hours you don't have. There isn't one. The backup is the recovery.

There is exactly one expert hail-mary, and it comes with a klaxon:

Danger

mdadm --create --assume-clean re-stamps fresh superblocks onto the members without touching the data, in the hope of re-describing an array whose metadata was lost (corrupted superblocks, an accidental wipe) while the data underneath survived. It is not a recovery from missing disks — it cannot conjure data that two dead disks took with them. And it is brutally unforgiving: you must reproduce the exact original level, device order, chunk size, and data offset, and a single wrong parameter writes a new parity layout over your real data and finishes the job for good. This is a last resort for someone who knows precisely what they're doing, run against dd images, never the only copy. If your data matters and you're not certain, stop and pay a data-recovery service — or just restore the backup.

How to Avoid It

You can't stop disks from dying — that's physics — but going offline is almost always avoidable, because it's nearly never the first failure. It's the second. An array reaches this page because a disk died, nobody noticed, and a while later a second one followed it into the grave. Catch the first one and you never get here. In order of leverage:

  1. Catch it while it's still just degraded. This is the whole game. A single dead disk is a survivable degraded array — replace that member before the next failure arrives and offline never happens. The gap between "one disk down" and "two disks down" is your entire window, and it's usually days or weeks wide. Don't waste it being unaware.
  2. RAID 6 over RAID 5 for any wide array. RAID 6 carries two independent parity calculations, so it survives two simultaneous failures where RAID 5 dies on the second. On big arrays of big disks — where rebuilds take many hours and the odds of a second disk faltering mid-rebuild climb — that extra parity disk is the difference between a tense afternoon and this page. The wider the array, the less excuse for RAID 5.
  3. A hot spare. Keep an idle disk in the array (mdadm --add it as a spare) and mdadm rebuilds onto it automatically the instant a member fails — shrinking the dangerous single-redundancy window from "however long until a human looks" down to minutes, often before a human even reads the alert.
  4. mdadm --monitor. Run the monitor daemon (mdadm --monitor --scan --mail you@example.com) so a failed member emails you the moment it drops. The entire reason arrays go offline is that the first failure was silent; the monitor is what gives the first failure a voice.
  5. Tested backups. When the disks really are gone, nothing else on this list helps — only a backup you've actually restored from gets your data back. Configured-and-assumed is not tested; restore it once and find out it works before you're betting the company on it. The backup page covers the how.

How RAID Falls Off the Cliff

Here's the shape of the whole thing in three steps, and why catching the middle one is everything. A healthy parity array is [UUUU] — full redundancy, every disk earning its keep. Lose one disk and you drop to [UUU_]: degraded, but still running, still serving every byte, the parity quietly reconstructing the missing disk on every read. That's not the cliff — that's the ledge, with a railing. You can stand there safely for as long as it takes to slot in a new disk.

Then the second disk goes. On RAID 5 there is no [UU__]-and-still-running state to fall into, because there's no parity left to compute the second missing disk from. The array can't assemble, flips to inactive, and the data goes offline — and on a true two-disk RAID 5 loss, mathematically unrecoverable. One missing disk is a survivable inconvenience; the very next one, on RAID 5, is a cliff with nothing at the bottom but your backup.

That asymmetry is the entire argument for everything above. The cost of catching the degraded state is a few minutes swapping a disk. The cost of missing it is restoring from backup — if you have one — and explaining to everyone why the service was down for a day. The degraded array is the last cheap moment before the expensive one. Don't sleep through it.

See Also

  • degraded RAID array — the survivable cousin: lost a disk but still serving, fix it before it lands here
  • RAID rebuilding — what happens after you re-add a disk and the array reconstructs itself
  • RAID — how mirroring and parity work, and why the second failure is the fatal one
  • mdadm — assemble, examine, force, and create every software array
  • /proc/mdstat — the kernel's live scoreboard, where inactive shows up first
  • smartctl — check whether a dropped member is truly dead or just sulking
  • failing disk — reading the SMART attributes on the disks that dropped
  • backup — the only thing that brings the data back when the disks are really gone

Your array is sitting there inactive and every byte on it is one command — or one backup — away. Which one?

CleverUptime watches /proc/mdstat on every server you run it on and catches the array the moment it drops to degraded — the brief window before a second disk drops it offline — naming the array and telling you in plain language that its redundancy is gone, so you replace the failed member while the array can still be saved instead of meeting it inactive at 3 a.m.

Want to see your own server's health right now? One command, no signup, no install.

Check your server →