Linux Problems: Symptoms, Diagnoses, and Fixes

When something on your server is wrong, this is where you start — by symptom, not by guess.

How to Read This Index

When your server has a problem, you usually know the symptom but not the cause. This index is grouped by what you noticed first — the box is slow, the disk is full, a service died, you can't SSH in. Pick the section that matches your symptom and follow the link into the walkthrough. Each problem page diagnoses several possible causes and shows the fix for each one. If you already know exactly which problem you're chasing, jump to the full alphabetical list at the bottom.

1. The Server Is Slow

The classic "something feels wrong" call. The cause is almost always one of these four — load, CPU, memory pressure, or the disk being slow — and they often cascade into each other.

  • high load — load average way above your core count, and what's actually behind it
  • high memory usage — RAM is filling up and you're inching toward trouble
  • high I/O wait — the CPU is waiting on disk, and nothing else can move
  • thrashing — the box is spending more time paging than working
  • swapping — memory has overflowed to disk and everything slows to a crawl
  • runaway process — one process is eating the whole machine
  • poor performance — the umbrella walkthrough when you don't know which of the above it is
  • bad performance — the sibling page focused on app-level latency

2. The Disk Is Full or Failing

The single most common production outage. Either the disk filled up, the underlying device is dying, or the filesystem itself has gone sideways.

  • disk full — out of space, and the inode trap that catches everyone once
  • disk failing — SMART is screaming; act before you lose data
  • disk error — reads or writes are returning errors and the kernel log knows why
  • SSD worn out — the drive has used up its write life; old, not broken — plan the swap
  • NVMe spare exhausted — the reserve is gone and the drive is about to go read-only; replace now
  • disk cable errors — rising CRC counts mean a bad cable, not a dying disk
  • SMART unavailable — can't read the drive's health; here's how to get through
  • RAID degraded — a drive dropped out of the array; redundancy is gone until you replace it
  • RAID rebuilding — the array is resyncing onto a fresh disk; what to expect while it runs
  • file system corruption — the filesystem itself is damaged, and fsck is in your future
  • data corruption — files, databases, or whole filesystems are returning the wrong bytes

3. Out of Memory, Swap, and Bad RAM

When the kernel runs out of RAM, the OOM killer starts picking processes to sacrifice — or memory pressure spills into swap, or the RAM itself is faulty. This is what you see, and how you get ahead of it.

  • out of memory — the kernel ran out and started killing things
  • high memory usage — RAM is filling up; the truth is in MemAvailable, not "used"
  • memory leak — a process that just keeps eating, forever
  • swap full — swap is exhausted; with RAM full too, OOM is imminent
  • swapping — the warning sign before OOM hits
  • swap thrashing — paging has taken over and nothing useful happens
  • swappiness misconfigured — the kernel swaps hot memory too eagerly for your workload
  • memory errors (ECC) — a DIMM is throwing correctable/uncorrectable errors; replace it
  • bad RAM — the kernel found genuinely faulty memory and poisoned the pages

4. A Service or Process Is Broken

Not the box — just one program. It crashed, it's stuck, it's a zombie, or its code has a bug you finally have to chase.

  • software crash — a service or daemon died; logs, exit codes, and restart strategy
  • system crash — the kernel itself went down
  • segfault — a process touched memory it didn't own, and the kernel killed it
  • buffer overflow — the classic memory-safety bug, and why it still matters
  • zombie process — dead but not reaped; what it means and when to worry
  • device busy — you can't unmount or release something because a process is still holding it
  • file not found — the file the app wanted isn't where it expected
  • software bug — the umbrella diagnosis when nothing else fits
  • configuration error — the code is fine; you mistyped something in a config file
  • incorrect configuration — the broader walkthrough for misconfigured services

5. The Box Is Unreachable

You can't get in. SSH won't connect, the kernel may have panicked, the network is down, or the firewall is in your way.

  • system failure — the box stopped responding entirely
  • kernel panic — the kernel hit something fatal and halted
  • kernel issue — non-fatal kernel weirdness, modules, parameters
  • boot failure — it won't come back up; bootloader, initramfs, fstab
  • SSH issue — keys, configs, sshd flags — when the door won't open
  • network failure — interfaces, routes, link state; the whole network stack
  • firewall issue — your rules are blocking what you wanted to allow
  • DNS issue — names aren't resolving, and everything that depends on them is broken
  • NTP issue — time sync is broken and TLS is about to start failing
  • incorrect time — the clock drifted and now auth and certs misbehave

6. Hardware Trouble

When the silicon underneath has stopped cooperating — disks, RAM, power, the board itself.

7. Security

When something — or someone — got in, or could.

  • security breach — they're already inside; here's how you find them and what to do
  • data breach — data has left the building; obligations, forensics, notifications
  • security vulnerability — a known hole you need to close before it's used
  • security issue — the umbrella walkthrough for weaker-than-it-should-be
  • malware infection — something malicious is running; spot it and remove it
  • denial of service — the box is being drowned in traffic on purpose
  • permission issue — the wrong user can read/write something they shouldn't, or the right user can't

8. Application-Level Problems

Higher up the stack — your database, your email server, your app — when the OS is fine but the workload is not.

Full Alphabetical List

Every problem page in the knowledge base. If you already know the name, jump straight to it.

Watching the right symptoms early prevents most of these from ever happening.

CleverUptime watches the underlying signals — disk usage trending up, load creeping, memory pressure rising, services flapping, errors landing in the kernel ring buffer — long before they turn into the symptoms above, and tells you in plain language what's wrong so you reach the right diagnosis fast.

Want to see your own server's health right now? One command, no signup, no install.

Check your server →