How to Deploy Grafana and Prometheus on a Cloud Server — Production Monitoring Stack

Uptime monitoring tells you when a service is down. But sometimes things degrade slowly — CPU usage creeping up over days, memory gradually filling, disk I/O increasing as a database grows. By the time it becomes an outage, it's been telegraphing problems for a while.

Prometheus + Grafana is how I watch those trends. Prometheus scrapes metrics from my servers every 15 seconds, stores them in a time-series database, and Grafana visualizes them as dashboards. I can see exactly when memory started climbing, compare this week's CPU usage to last week, and set up alerts before problems become incidents.

Setting it up takes an afternoon the first time, but once it's running the dashboards more or less run themselves.

This guide deploys Prometheus and Grafana using Docker Compose on Ubuntu 22.04, with Node Exporter for system metrics, Nginx as the reverse proxy, and HTTPS.

I run this stack on Tencent Cloud Lighthouse. Prometheus + Grafana + Node Exporter runs comfortably on the 4 GB RAM plan. Select Lighthouse's Docker CE application image when creating the instance — Docker is pre-installed, so the Docker Compose deployment in this guide can start immediately without a separate Docker setup step. An additional advantage: the Lighthouse control panel provides basic CPU and bandwidth metrics, which complement Grafana's per-second detail with infrastructure-level visibility.

How Prometheus + Grafana Works
Prerequisites
Part 1 — Server Setup
Part 2 — Deploy the Monitoring Stack
Part 3 — Configure Nginx Reverse Proxy
Part 4 — Enable HTTPS
Part 5 — First Login and Dashboard Setup
Part 6 — Import Community Dashboards
Part 7 — Monitor a Second Server
Part 8 — Configure Alerts
The Gotcha: Prometheus Data Retention
Useful Prometheus Queries

Key Takeaways

Use the appropriate Lighthouse application image to skip manual installation steps where available
Lighthouse snapshots provide one-click full-server backup before major changes
OrcaTerm browser terminal lets you manage the server from any device
CBS cloud disk expansion handles growing storage needs without server migration
Console-level firewall + UFW = two independent protection layers

How Prometheus + Grafana Works {#how}

┌─────────────────────────────┐
│  Applications / Servers      │
│  └── Exporters (metrics)     │ ←── Expose metrics at /metrics endpoint
└──────────────────────────────┘
           ↑ scrape every 15s
┌──────────────────────────────┐
│  Prometheus                  │ ←── Stores time-series metrics data
└──────────────────────────────┘
           ↑ query
┌──────────────────────────────┐
│  Grafana                     │ ←── Visualizes metrics in dashboards
└──────────────────────────────┘

Exporters expose metrics (CPU usage, memory, HTTP requests, etc.) at HTTP endpoints. Prometheus scrapes these endpoints on a schedule and stores the data. Grafana queries Prometheus and renders dashboards.

Prerequisites {#prerequisites}

Requirement	Notes
Cloud server	Tencent Cloud Lighthouse Ubuntu 22.04
Docker + Compose	Installed
Nginx	For reverse proxy with HTTPS
Domain name	For accessing dashboards

Part 1 — Server Setup {#part-1}

ssh ubuntu@YOUR_SERVER_IP
sudo apt update && sudo apt upgrade -y

curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker

sudo apt install -y nginx
sudo ufw allow ssh
sudo ufw allow 'Nginx Full'
sudo ufw enable

Part 2 — Deploy the Monitoring Stack {#part-2}

mkdir -p ~/apps/monitoring && cd ~/apps/monitoring

Create prometheus.yml — tells Prometheus what to scrape:

mkdir -p prometheus
cat > prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s       # Collect metrics every 15 seconds
  evaluation_interval: 15s   # Evaluate rules every 15 seconds

scrape_configs:
  # Prometheus monitors itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Node Exporter: system metrics (CPU, memory, disk, network)
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  # Add more targets here as needed
  # - job_name: 'my-app'
  #   static_configs:
  #     - targets: ['app-container:8080']
EOF

Create docker-compose.yml:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'   # Keep 30 days of data
      - '--web.enable-lifecycle'               # Allow config reload via API

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
      GF_USERS_ALLOW_SIGN_UP: "false"
      GF_SERVER_ROOT_URL: https://grafana.yourdomain.com

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    network_mode: host   # Required for accurate network interface metrics

volumes:
  prometheus_data:
  grafana_data:

Create .env:

echo "GRAFANA_PASSWORD=choose_strong_grafana_password" > .env
chmod 600 .env

Start the stack:

docker compose up -d
docker compose ps
# All three containers should show as healthy

Part 3 — Configure Nginx Reverse Proxy {#part-3}

sudo nano /etc/nginx/sites-available/grafana

server {
    listen 80;
    server_name grafana.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;

        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for Grafana live features
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
    }
}

Optionally, expose Prometheus (restrict to your IP for security):

sudo nano /etc/nginx/sites-available/prometheus

server {
    listen 80;
    server_name prometheus.yourdomain.com;

    # Restrict to your IP
    allow YOUR_IP;
    deny all;

    location / {
        proxy_pass http://127.0.0.1:9090;
        proxy_set_header Host $host;
    }
}

sudo ln -s /etc/nginx/sites-available/grafana /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/prometheus /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Part 4 — Enable HTTPS {#part-4}

sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d grafana.yourdomain.com
# Optional: sudo certbot --nginx -d prometheus.yourdomain.com

Visit https://grafana.yourdomain.com.

Add Prometheus as a data source:

Connections → Data sources → Add data source
Select Prometheus
URL: http://prometheus:9090
Click Save & Test — should show "Data source is working"

Part 6 — Import Community Dashboards {#part-6}

Grafana has a library of pre-built dashboards. Import the Node Exporter dashboard for instant system metrics:

Dashboards → New → Import
Enter dashboard ID: 1860 (Node Exporter Full — the most popular)
Click Load
Select your Prometheus data source
Click Import

You now have a comprehensive dashboard showing:

CPU usage by core
Memory and swap usage
Disk I/O and space
Network traffic
System load average
Open file descriptors

Other useful dashboard IDs:

3662 — Prometheus 2.0 Overview
893 — Docker Container & Host Metrics
7587 — Nginx (if you have the Nginx exporter)
9528 — PostgreSQL Database

Part 7 — Monitor a Second Server {#part-7}

To monitor additional servers, install Node Exporter on each:

# On the second server
docker run -d \
  --name node-exporter \
  --restart unless-stopped \
  --net=host \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /:/rootfs:ro \
  prom/node-exporter:latest \
  --path.procfs=/host/proc \
  --path.rootfs=/rootfs \
  --path.sysfs=/host/sys

# Open port 9100 for Prometheus scraping (restrict to your monitoring server IP)
sudo ufw allow from MONITORING_SERVER_IP to any port 9100

Add the server to Prometheus config on the monitoring server:

nano ~/apps/monitoring/prometheus/prometheus.yml

  - job_name: 'server-2'
    static_configs:
      - targets: ['SECOND_SERVER_IP:9100']
        labels:
          instance: 'server-2'
          job: 'node'

Reload Prometheus:

curl -X POST http://localhost:9090/-/reload

Part 8 — Configure Alerts {#part-8}

Grafana alerts (recommended for beginners)

Open any panel in a dashboard
Click Edit → Alert tab
Set conditions (e.g., CPU > 90% for 5 minutes)
Add notification channel (Email, Telegram, Slack, etc.)
Save

Prometheus alerting rules

Create prometheus/alert_rules.yml:

groups:
  - name: server_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"

      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Only {{ $value }}% disk space remaining"

Add to prometheus.yml:

rule_files:
  - "alert_rules.yml"

The Gotcha: Prometheus Data Retention {#gotcha}

Prometheus stores metrics in a time-series database that grows continuously. The default retention is 15 days. With many metrics sources and short scrape intervals, this can use significant disk space.

Check current disk usage:

docker exec prometheus du -sh /prometheus

Configure retention in docker-compose.yml:

command:
  - '--storage.tsdb.retention.time=30d'    # Time-based (30 days)
  - '--storage.tsdb.retention.size=10GB'   # Size-based (10 GB max)

For long-term storage, consider Thanos or Cortex, or export important metrics to a cheaper storage solution.

Useful Prometheus Queries {#queries}

# CPU usage percentage (all cores)
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk usage percentage (root filesystem)
100 - ((node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100)

# Network traffic in (bytes per second)
rate(node_network_receive_bytes_total{device!="lo"}[5m])

# System load average (1 minute)
node_load1

# Uptime in days
(node_time_seconds - node_boot_time_seconds) / 86400

Troubleshooting {#troubleshooting}

Issue	Likely Cause	Fix
Connection refused	Service not running or wrong port	Check `systemctl status SERVICE` and verify firewall rules
Permission denied	Wrong file ownership or permissions	Check file ownership with `ls -la` and use `chown`/`chmod` to fix
502 Bad Gateway	Backend service not running	Restart the backend service; check logs with `journalctl -u SERVICE`
SSL certificate error	Certificate expired or domain mismatch	Run `sudo certbot renew` and verify domain DNS points to server IP
Service not starting	Config error or missing dependency	Check logs with `journalctl -u SERVICE -n 50` for specific error
Out of disk space	Logs or data accumulation	Run `df -h` to identify usage; clean logs or attach CBS storage
High memory usage	Too many processes or memory leak	Check with `htop`; consider upgrading instance plan if consistently high
Firewall blocking traffic	Port not open in UFW or Lighthouse console	Open port in Lighthouse console firewall AND `sudo ufw allow PORT`

Frequently Asked Questions {#faq}

How much resource does Grafana and Prometheus use on the server?
Grafana and Prometheus is designed to be lightweight. It typically uses minimal CPU and 50–200 MB RAM. Run it on the same server as your applications without significant impact.

How do I get alerts when a service goes down?
Configure Grafana and Prometheus's notification integrations — most support email, Telegram, Slack, Discord, and webhook. Set appropriate check intervals (every 60 seconds is typical) and recovery thresholds to avoid alert fatigue from brief glitches.

Can I monitor multiple servers with one Grafana and Prometheus instance?
Yes. Add the server IPs or domains as separate monitors. For agent-based monitoring, install the agent on each server you want to track.

How do I monitor SSL certificate expiry?
Add a certificate check to your monitoring. Most monitoring tools including Grafana and Prometheus support HTTPS checks that alert when certificates are within a configurable days-to-expiry threshold.

What's the difference between uptime monitoring and performance monitoring?
Uptime monitoring checks if a service is available (up/down). Performance monitoring tracks metrics over time (CPU%, response times, database query counts). Both are complementary.

Set up your monitoring stack today:
👉 Tencent Cloud Lighthouse — Ubuntu VPS for your monitoring infrastructure
👉 View current pricing and promotions
👉 Explore all active deals and offers