Uptime monitoring tells you when a service is down. But sometimes things degrade slowly — CPU usage creeping up over days, memory gradually filling, disk I/O increasing as a database grows. By the time it becomes an outage, it's been telegraphing problems for a while.
Prometheus + Grafana is how I watch those trends. Prometheus scrapes metrics from my servers every 15 seconds, stores them in a time-series database, and Grafana visualizes them as dashboards. I can see exactly when memory started climbing, compare this week's CPU usage to last week, and set up alerts before problems become incidents.
Setting it up takes an afternoon the first time, but once it's running the dashboards more or less run themselves.
This guide deploys Prometheus and Grafana using Docker Compose on Ubuntu 22.04, with Node Exporter for system metrics, Nginx as the reverse proxy, and HTTPS.
I run this stack on Tencent Cloud Lighthouse. Prometheus + Grafana + Node Exporter runs comfortably on the 4 GB RAM plan. Select Lighthouse's Docker CE application image when creating the instance — Docker is pre-installed, so the Docker Compose deployment in this guide can start immediately without a separate Docker setup step. An additional advantage: the Lighthouse control panel provides basic CPU and bandwidth metrics, which complement Grafana's per-second detail with infrastructure-level visibility.
- Key Takeaways
┌─────────────────────────────┐
│ Applications / Servers │
│ └── Exporters (metrics) │ ←── Expose metrics at /metrics endpoint
└──────────────────────────────┘
↑ scrape every 15s
┌──────────────────────────────┐
│ Prometheus │ ←── Stores time-series metrics data
└──────────────────────────────┘
↑ query
┌──────────────────────────────┐
│ Grafana │ ←── Visualizes metrics in dashboards
└──────────────────────────────┘
Exporters expose metrics (CPU usage, memory, HTTP requests, etc.) at HTTP endpoints. Prometheus scrapes these endpoints on a schedule and stores the data. Grafana queries Prometheus and renders dashboards.
| Requirement | Notes |
|---|---|
| Cloud server | Tencent Cloud Lighthouse Ubuntu 22.04 |
| Docker + Compose | Installed |
| Nginx | For reverse proxy with HTTPS |
| Domain name | For accessing dashboards |
ssh ubuntu@YOUR_SERVER_IP
sudo apt update && sudo apt upgrade -y
curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker
sudo apt install -y nginx
sudo ufw allow ssh
sudo ufw allow 'Nginx Full'
sudo ufw enable
mkdir -p ~/apps/monitoring && cd ~/apps/monitoring
Create prometheus.yml — tells Prometheus what to scrape:
mkdir -p prometheus
cat > prometheus/prometheus.yml << 'EOF'
global:
scrape_interval: 15s # Collect metrics every 15 seconds
evaluation_interval: 15s # Evaluate rules every 15 seconds
scrape_configs:
# Prometheus monitors itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Node Exporter: system metrics (CPU, memory, disk, network)
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# Add more targets here as needed
# - job_name: 'my-app'
# static_configs:
# - targets: ['app-container:8080']
EOF
Create docker-compose.yml:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d' # Keep 30 days of data
- '--web.enable-lifecycle' # Allow config reload via API
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_USER: admin
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
GF_USERS_ALLOW_SIGN_UP: "false"
GF_SERVER_ROOT_URL: https://grafana.yourdomain.com
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
network_mode: host # Required for accurate network interface metrics
volumes:
prometheus_data:
grafana_data:
Create .env:
echo "GRAFANA_PASSWORD=choose_strong_grafana_password" > .env
chmod 600 .env
Start the stack:
docker compose up -d
docker compose ps
# All three containers should show as healthy
sudo nano /etc/nginx/sites-available/grafana
server {
listen 80;
server_name grafana.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support for Grafana live features
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
}
}
Optionally, expose Prometheus (restrict to your IP for security):
sudo nano /etc/nginx/sites-available/prometheus
server {
listen 80;
server_name prometheus.yourdomain.com;
# Restrict to your IP
allow YOUR_IP;
deny all;
location / {
proxy_pass http://127.0.0.1:9090;
proxy_set_header Host $host;
}
}
sudo ln -s /etc/nginx/sites-available/grafana /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/prometheus /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d grafana.yourdomain.com
# Optional: sudo certbot --nginx -d prometheus.yourdomain.com
Visit https://grafana.yourdomain.com.
Login: admin / your Grafana password from .env.
Add Prometheus as a data source:
http://prometheus:9090Grafana has a library of pre-built dashboards. Import the Node Exporter dashboard for instant system metrics:
You now have a comprehensive dashboard showing:
Other useful dashboard IDs:
To monitor additional servers, install Node Exporter on each:
# On the second server
docker run -d \
--name node-exporter \
--restart unless-stopped \
--net=host \
-v /proc:/host/proc:ro \
-v /sys:/host/sys:ro \
-v /:/rootfs:ro \
prom/node-exporter:latest \
--path.procfs=/host/proc \
--path.rootfs=/rootfs \
--path.sysfs=/host/sys
# Open port 9100 for Prometheus scraping (restrict to your monitoring server IP)
sudo ufw allow from MONITORING_SERVER_IP to any port 9100
Add the server to Prometheus config on the monitoring server:
nano ~/apps/monitoring/prometheus/prometheus.yml
- job_name: 'server-2'
static_configs:
- targets: ['SECOND_SERVER_IP:9100']
labels:
instance: 'server-2'
job: 'node'
Reload Prometheus:
curl -X POST http://localhost:9090/-/reload
Create prometheus/alert_rules.yml:
groups:
- name: server_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is {{ $value }}%"
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Only {{ $value }}% disk space remaining"
Add to prometheus.yml:
rule_files:
- "alert_rules.yml"
Prometheus stores metrics in a time-series database that grows continuously. The default retention is 15 days. With many metrics sources and short scrape intervals, this can use significant disk space.
Check current disk usage:
docker exec prometheus du -sh /prometheus
Configure retention in docker-compose.yml:
command:
- '--storage.tsdb.retention.time=30d' # Time-based (30 days)
- '--storage.tsdb.retention.size=10GB' # Size-based (10 GB max)
For long-term storage, consider Thanos or Cortex, or export important metrics to a cheaper storage solution.
# CPU usage percentage (all cores)
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100
# Disk usage percentage (root filesystem)
100 - ((node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100)
# Network traffic in (bytes per second)
rate(node_network_receive_bytes_total{device!="lo"}[5m])
# System load average (1 minute)
node_load1
# Uptime in days
(node_time_seconds - node_boot_time_seconds) / 86400
| Issue | Likely Cause | Fix |
|---|---|---|
| Connection refused | Service not running or wrong port | Check systemctl status SERVICE and verify firewall rules |
| Permission denied | Wrong file ownership or permissions | Check file ownership with ls -la and use chown/chmod to fix |
| 502 Bad Gateway | Backend service not running | Restart the backend service; check logs with journalctl -u SERVICE |
| SSL certificate error | Certificate expired or domain mismatch | Run sudo certbot renew and verify domain DNS points to server IP |
| Service not starting | Config error or missing dependency | Check logs with journalctl -u SERVICE -n 50 for specific error |
| Out of disk space | Logs or data accumulation | Run df -h to identify usage; clean logs or attach CBS storage |
| High memory usage | Too many processes or memory leak | Check with htop; consider upgrading instance plan if consistently high |
| Firewall blocking traffic | Port not open in UFW or Lighthouse console | Open port in Lighthouse console firewall AND sudo ufw allow PORT |
How much resource does Grafana and Prometheus use on the server?
Grafana and Prometheus is designed to be lightweight. It typically uses minimal CPU and 50–200 MB RAM. Run it on the same server as your applications without significant impact.
How do I get alerts when a service goes down?
Configure Grafana and Prometheus's notification integrations — most support email, Telegram, Slack, Discord, and webhook. Set appropriate check intervals (every 60 seconds is typical) and recovery thresholds to avoid alert fatigue from brief glitches.
Can I monitor multiple servers with one Grafana and Prometheus instance?
Yes. Add the server IPs or domains as separate monitors. For agent-based monitoring, install the agent on each server you want to track.
How do I monitor SSL certificate expiry?
Add a certificate check to your monitoring. Most monitoring tools including Grafana and Prometheus support HTTPS checks that alert when certificates are within a configurable days-to-expiry threshold.
Set up your monitoring stack today:
👉 Tencent Cloud Lighthouse — Ubuntu VPS for your monitoring infrastructure
👉 View current pricing and promotions
👉 Explore all active deals and offers