Linux Problems: Symptoms, Diagnoses, and Fixes
When something on your server is wrong, this is where you start — by symptom, not by guess.
How to Read This Index
When your server has a problem, you usually know the symptom but not the cause. This index is grouped by what you noticed first — the box is slow, the disk is full, a service died, you can't SSH in. Pick the section that matches your symptom and follow the link into the walkthrough. Each problem page diagnoses several possible causes and shows the fix for each one. If you already know exactly which problem you're chasing, jump to the full alphabetical list at the bottom.
1. The Server Is Slow
The classic "something feels wrong" call. The cause is almost always one of these four — load, CPU, memory pressure, or the disk being slow — and they often cascade into each other.
- high load — load average way above your core count, and what's actually behind it
- high memory usage — RAM is filling up and you're inching toward trouble
- high I/O wait — the CPU is waiting on disk, and nothing else can move
- thrashing — the box is spending more time paging than working
- swapping — memory has overflowed to disk and everything slows to a crawl
- runaway process — one process is eating the whole machine
- poor performance — the umbrella walkthrough when you don't know which of the above it is
- bad performance — the sibling page focused on app-level latency
2. The Disk Is Full or Failing
The single most common production outage. Either the disk filled up, the underlying device is dying, or the filesystem itself has gone sideways.
- disk full — out of space, and the inode trap that catches everyone once
- disk failing — SMART is screaming; act before you lose data
- disk error — reads or writes are returning errors and the kernel log knows why
- SSD worn out — the drive has used up its write life; old, not broken — plan the swap
- NVMe spare exhausted — the reserve is gone and the drive is about to go read-only; replace now
- disk cable errors — rising CRC counts mean a bad cable, not a dying disk
- SMART unavailable — can't read the drive's health; here's how to get through
- RAID degraded — a drive dropped out of the array; redundancy is gone until you replace it
- RAID rebuilding — the array is resyncing onto a fresh disk; what to expect while it runs
- file system corruption — the filesystem itself is damaged, and
fsckis in your future - data corruption — files, databases, or whole filesystems are returning the wrong bytes
3. Out of Memory, Swap, and Bad RAM
When the kernel runs out of RAM, the OOM killer starts picking processes to sacrifice — or memory pressure spills into swap, or the RAM itself is faulty. This is what you see, and how you get ahead of it.
- out of memory — the kernel ran out and started killing things
- high memory usage — RAM is filling up; the truth is in MemAvailable, not "used"
- memory leak — a process that just keeps eating, forever
- swap full — swap is exhausted; with RAM full too, OOM is imminent
- swapping — the warning sign before OOM hits
- swap thrashing — paging has taken over and nothing useful happens
- swappiness misconfigured — the kernel swaps hot memory too eagerly for your workload
- memory errors (ECC) — a DIMM is throwing correctable/uncorrectable errors; replace it
- bad RAM — the kernel found genuinely faulty memory and poisoned the pages
4. A Service or Process Is Broken
Not the box — just one program. It crashed, it's stuck, it's a zombie, or its code has a bug you finally have to chase.
- software crash — a service or daemon died; logs, exit codes, and restart strategy
- system crash — the kernel itself went down
- segfault — a process touched memory it didn't own, and the kernel killed it
- buffer overflow — the classic memory-safety bug, and why it still matters
- zombie process — dead but not reaped; what it means and when to worry
- device busy — you can't unmount or release something because a process is still holding it
- file not found — the file the app wanted isn't where it expected
- software bug — the umbrella diagnosis when nothing else fits
- configuration error — the code is fine; you mistyped something in a config file
- incorrect configuration — the broader walkthrough for misconfigured services
5. The Box Is Unreachable
You can't get in. SSH won't connect, the kernel may have panicked, the network is down, or the firewall is in your way.
- system failure — the box stopped responding entirely
- kernel panic — the kernel hit something fatal and halted
- kernel issue — non-fatal kernel weirdness, modules, parameters
- boot failure — it won't come back up; bootloader, initramfs, fstab
- SSH issue — keys, configs, sshd flags — when the door won't open
- network failure — interfaces, routes, link state; the whole network stack
- firewall issue — your rules are blocking what you wanted to allow
- DNS issue — names aren't resolving, and everything that depends on them is broken
- NTP issue — time sync is broken and TLS is about to start failing
- incorrect time — the clock drifted and now auth and certs misbehave
6. Hardware Trouble
When the silicon underneath has stopped cooperating — disks, RAM, power, the board itself.
- hardware failure — the umbrella for physical-layer problems
- disk failing — SSDs and HDDs both warn before they die; listen
- power issue — outages, brownouts, PSU trouble
7. Security
When something — or someone — got in, or could.
- security breach — they're already inside; here's how you find them and what to do
- data breach — data has left the building; obligations, forensics, notifications
- security vulnerability — a known hole you need to close before it's used
- security issue — the umbrella walkthrough for weaker-than-it-should-be
- malware infection — something malicious is running; spot it and remove it
- denial of service — the box is being drowned in traffic on purpose
- permission issue — the wrong user can read/write something they shouldn't, or the right user can't
8. Application-Level Problems
Higher up the stack — your database, your email server, your app — when the OS is fine but the workload is not.
- database issue — slow queries, replication lag, deadlocks, corruption
- email issue — sending, receiving, deliverability, queues backed up
- virtualization issue — guests misbehaving, hypervisor quirks, noisy neighbors
Full Alphabetical List
Every problem page in the knowledge base. If you already know the name, jump straight to it.
- bad performance — the app-latency-focused performance walkthrough
- bad RAM — the kernel found genuinely faulty memory and poisoned the pages
- boot failure — the server won't come back up after a reboot
- buffer overflow — the classic memory-safety bug and what it still costs us
- configuration error — a misconfigured service refusing to start or behaving oddly
- data breach — sensitive data was accessed or exposed without authorization
- data corruption — files, databases, or filesystems returning the wrong bytes
- database issue — slow queries, replication lag, deadlocks, corruption
- denial of service — the box is being drowned in traffic on purpose
- device busy — a resource is still held open and you can't release it
- disk cable errors — rising CRC counts point at the cable, not the disk
- disk error — read/write errors surfacing in the kernel ring buffer
- disk failing — SMART is warning; act before you lose data
- disk full — out of space, and the inode trap that catches everyone once
- DNS issue — names aren't resolving and everything downstream breaks
- email issue — sending, receiving, deliverability, queue trouble
- file not found — the file the application expected isn't there
- file system corruption — the filesystem itself is damaged
- firewall issue — your rules are blocking what you wanted to allow
- hardware failure — the umbrella for physical-layer problems
- high I/O wait — the CPU is waiting on disk and nothing else can move
- high load — load average way above core count, and what's behind it
- high memory usage — RAM is filling up and you're approaching trouble
- incorrect configuration — broader walkthrough for misconfigured services
- incorrect time — the clock drifted and now auth and certs misbehave
- kernel issue — non-fatal kernel weirdness, modules, parameters
- kernel panic — the kernel hit something fatal and halted
- malware infection — something malicious is running on the box
- memory errors (ECC) — a DIMM is throwing correctable/uncorrectable errors; replace it
- memory leak — a process that just keeps eating RAM forever
- network failure — interfaces, routes, link state; the whole network stack
- NTP issue — time sync is broken and TLS is about to start failing
- NVMe spare exhausted — the reserve ran dry; the drive is a real defect, replace now
- out of memory — the kernel ran out and started killing things
- permission issue — the wrong access or denied access where you didn't expect
- poor performance — the umbrella walkthrough for "it's slow"
- power issue — outages, brownouts, PSU trouble
- RAID degraded — a drive dropped out of the array; redundancy is gone
- RAID rebuilding — the array is resyncing onto a fresh disk
- runaway process — one process is eating the whole machine
- security breach — they're already inside; find them and respond
- security issue — the umbrella walkthrough for weaker-than-it-should-be
- security vulnerability — a known hole you need to close
- segfault — a process touched memory it didn't own
- SMART unavailable — can't read the drive's health log, and how to get through
- software bug — the umbrella diagnosis when nothing else fits
- software crash — a service or daemon died; logs, exit codes, restarts
- SSD worn out — the drive used up its write life; old, not broken — plan the swap
- SSH issue — keys, configs, sshd flags — when the door won't open
- swap full — swap is exhausted; with RAM full too, OOM is imminent
- swap thrashing — the box pages constantly and does no real work
- swappiness misconfigured — the kernel swaps hot memory too eagerly for the workload
- swapping — memory has overflowed to disk and everything slows to a crawl
- system crash — the kernel itself went down
- system failure — the box stopped responding entirely
- virtualization issue — guests misbehaving, hypervisor quirks
- zombie process — dead but not reaped; what it means and when to worry
Watching the right symptoms early prevents most of these from ever happening.
CleverUptime watches the underlying signals — disk usage trending up, load creeping, memory pressure rising, services flapping, errors landing in the kernel ring buffer — long before they turn into the symptoms above, and tells you in plain language what's wrong so you reach the right diagnosis fast.
Want to see your own server's health right now? One command, no signup, no install.