How it works
CleverUptime explained
TL;DR
In short, CleverUptime will run a script on your server, which detects metrics such as CPU and disk performance, but also running applications and network interfaces. From this data, it will generate "monitors", which are executed periodically on the CleverUptime servers to probe your server's IP addresses, ports, certificates, etc. from multiple locations. When CleverUptime detects something wrong on your server, or some problems with ports or applications, it will notify you via Email, Slack or SMS.
Let's look at the different parts in more detail.
The Script
You might wonder if it is safe to run a script from CleverUptime on your servers. And you're absolutely right, it can be very dangerous to run scripts from other people on your machine.
There's really only one way to make it safe: You have to read it!
Don't worry, it's not very long and it only contains commands that you should know as a Linux admin. Basically CleverUptime collects information about your server from sources such as the proc filesystem of the kernel.
This raw data is now send to the closest CleverUptime server where it will be parsed and analyzed. Now CleverUptime knows about CPU and disk usage, available RAM and swap space and many other things:
- hostname, machineId, productUuid
- hardware information like sys vendor and board serial
- date, time, uptime and timezone
- OS and kernel version
- CPU info, load, iowait
- memory, swap usage
- disk usage, performance and health information
- information about partitions, mounts and RAID
- network interfaces, ports, IP addresses and Wireguard tunnels
- information about users and groups
But we're not quite done yet. This is just the data that sits on your servers. CleverUptime now uses the data to figure out what else to monitor.
Monitors
For example, if CleverUptime detects a web server like Apache or Nginx it will also create an additional monitor to check if the ports 80 and 443 are open and accessible from the internet. This monitor will probe your server in fixed intervals and from different locations to check if your web server is still listening for new connections.
Another monitor will automatically be created to download your homepage and check if your HTTP status is OK (code 200). There will also be monitors to make sure that certain files and directories are NOT accessible through your webserver. For example, there shouldn't be a .git directory as it might leak information about your application.
If CleverUptime detects a database such as MariaDB or PostgreSQL, monitors will be created that make sure, that the ports 3306 and 5432 are not open. Otherwise, people might be able to connect to your database and steal or manipulate data.
So you get the picture: Just start the script, and a whole bunch of monitors will be created to make sure, that your servers are safe and operating alright.
Here's a list of monitors that CleverUptime supports:
- ICMP ping
- port scan
- port connect
- HTTP/HTTPS connect
- SSL test and certificate
- DNS
- domain
Root Cause Analysis
All this data is analyzed and stored, so you can get an easy overview about what's going on on your servers. In addition to that, CleverUptime permanently looks for problems and anomalies in the data.
For example, CleverUptime might detect a high load on your server. That means, your CPU spends more time doing things than it has available. For your customers, this might result in a slow loading or unresponsive web page. But why did it happen and how can it be fixed?
This is where CleverUptime's intelligent root cause analysis comes into play! It uses machine learning algorithms to find out what caused the high load. For example, it might detect, that the high load is actually caused by iowait. That means, the CPU is waiting for data from the disk. So maybe your server needs a new disk? Some performance problems may in fact be related to failing hardware, which CleverUptime would detect, too.
In our example, the disks are OK, but available memory is very low. When this happens, the kernel tries to swap out memory from RAM to disk which is extremely slow. The whole server will seem unresponsive. So more RAM then? Sure, you can never have enough RAM, so that's always a good idea. But let's first check who uses all the memory.
CleverUptime detects a process called mysql which uses 90% of all available memory. So we found the culprit, it's the MySQL server. The good news is, that this does not necessarily mean that your server is too small for the database. It might just be a configuration error.
Knowledge Base
At this point, CleverUptime would send you an alert via Email, SMS or Slack. The message would contain a description of what has been detected and additional helpful information about the server and that the database seems to be the origin of the problem.
But this message will also contain a link to CleverUptime's knowledge base. This is a huge collection of knowledge about Linux commands, tutorials how to diagnose problems and best practices how to set up applications, so that you can get optimal performance (and avoid pitfalls as in our example where the database used more RAM than the server actually had).
That's it, this is how CleverUptime works. It's designed to help you keep your servers safe and running. You just start the script and forget about it unless there is a problem. If you need help, CleverUptime's knowledge base is there to help you set up a server or explain how the internals of Linux work.