Technology Encyclopedia Home >How to Deploy Grafana and Prometheus on a Cloud Server — Production Monitoring Stack

How to Deploy Grafana and Prometheus on a Cloud Server — Production Monitoring Stack

Uptime monitoring tells you when a service is down. But sometimes things degrade slowly — CPU usage creeping up over days, memory gradually filling, disk I/O increasing as a database grows. By the time it becomes an outage, it's been telegraphing problems for a while.

Prometheus + Grafana is how I watch those trends. Prometheus scrapes metrics from my servers every 15 seconds, stores them in a time-series database, and Grafana visualizes them as dashboards. I can see exactly when memory started climbing, compare this week's CPU usage to last week, and set up alerts before problems become incidents.

Setting it up takes an afternoon the first time, but once it's running the dashboards more or less run themselves.

This guide deploys Prometheus and Grafana using Docker Compose on Ubuntu 22.04, with Node Exporter for system metrics, Nginx as the reverse proxy, and HTTPS.

I run this stack on Tencent Cloud Lighthouse. Prometheus + Grafana + Node Exporter runs comfortably on the 4 GB RAM plan. Select Lighthouse's Docker CE application image when creating the instance — Docker is pre-installed, so the Docker Compose deployment in this guide can start immediately without a separate Docker setup step. An additional advantage: the Lighthouse control panel provides basic CPU and bandwidth metrics, which complement Grafana's per-second detail with infrastructure-level visibility.


Table of Contents

  1. How Prometheus + Grafana Works
  2. Prerequisites
  3. Part 1 — Server Setup
  4. Part 2 — Deploy the Monitoring Stack
  5. Part 3 — Configure Nginx Reverse Proxy
  6. Part 4 — Enable HTTPS
  7. Part 5 — First Login and Dashboard Setup
  8. Part 6 — Import Community Dashboards
  9. Part 7 — Monitor a Second Server
  10. Part 8 — Configure Alerts
  11. The Gotcha: Prometheus Data Retention
  12. Useful Prometheus Queries

  • Key Takeaways
  • Use the appropriate Lighthouse application image to skip manual installation steps where available
  • Lighthouse snapshots provide one-click full-server backup before major changes
  • OrcaTerm browser terminal lets you manage the server from any device
  • CBS cloud disk expansion handles growing storage needs without server migration
  • Console-level firewall + UFW = two independent protection layers

How Prometheus + Grafana Works {#how}

┌─────────────────────────────┐
│  Applications / Servers      │
│  └── Exporters (metrics)     │ ←── Expose metrics at /metrics endpoint
└──────────────────────────────┘
           ↑ scrape every 15s
┌──────────────────────────────┐
│  Prometheus                  │ ←── Stores time-series metrics data
└──────────────────────────────┘
           ↑ query
┌──────────────────────────────┐
│  Grafana                     │ ←── Visualizes metrics in dashboards
└──────────────────────────────┘

Exporters expose metrics (CPU usage, memory, HTTP requests, etc.) at HTTP endpoints. Prometheus scrapes these endpoints on a schedule and stores the data. Grafana queries Prometheus and renders dashboards.


Prerequisites {#prerequisites}

Requirement Notes
Cloud server Tencent Cloud Lighthouse Ubuntu 22.04
Docker + Compose Installed
Nginx For reverse proxy with HTTPS
Domain name For accessing dashboards

Part 1 — Server Setup {#part-1}

ssh ubuntu@YOUR_SERVER_IP
sudo apt update && sudo apt upgrade -y

curl -fsSL https://get.docker.com | sudo sh
sudo usermod -aG docker $USER
newgrp docker

sudo apt install -y nginx
sudo ufw allow ssh
sudo ufw allow 'Nginx Full'
sudo ufw enable

Part 2 — Deploy the Monitoring Stack {#part-2}

mkdir -p ~/apps/monitoring && cd ~/apps/monitoring

Create prometheus.yml — tells Prometheus what to scrape:

mkdir -p prometheus
cat > prometheus/prometheus.yml << 'EOF'
global:
  scrape_interval: 15s       # Collect metrics every 15 seconds
  evaluation_interval: 15s   # Evaluate rules every 15 seconds

scrape_configs:
  # Prometheus monitors itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Node Exporter: system metrics (CPU, memory, disk, network)
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  # Add more targets here as needed
  # - job_name: 'my-app'
  #   static_configs:
  #     - targets: ['app-container:8080']
EOF

Create docker-compose.yml:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'   # Keep 30 days of data
      - '--web.enable-lifecycle'               # Allow config reload via API

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
      GF_USERS_ALLOW_SIGN_UP: "false"
      GF_SERVER_ROOT_URL: https://grafana.yourdomain.com

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    network_mode: host   # Required for accurate network interface metrics

volumes:
  prometheus_data:
  grafana_data:

Create .env:

echo "GRAFANA_PASSWORD=choose_strong_grafana_password" > .env
chmod 600 .env

Start the stack:

docker compose up -d
docker compose ps
# All three containers should show as healthy

Part 3 — Configure Nginx Reverse Proxy {#part-3}

sudo nano /etc/nginx/sites-available/grafana
server {
    listen 80;
    server_name grafana.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:3000;
        proxy_http_version 1.1;

        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for Grafana live features
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
    }
}

Optionally, expose Prometheus (restrict to your IP for security):

sudo nano /etc/nginx/sites-available/prometheus
server {
    listen 80;
    server_name prometheus.yourdomain.com;

    # Restrict to your IP
    allow YOUR_IP;
    deny all;

    location / {
        proxy_pass http://127.0.0.1:9090;
        proxy_set_header Host $host;
    }
}
sudo ln -s /etc/nginx/sites-available/grafana /etc/nginx/sites-enabled/
sudo ln -s /etc/nginx/sites-available/prometheus /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Part 4 — Enable HTTPS {#part-4}

sudo apt install -y certbot python3-certbot-nginx
sudo certbot --nginx -d grafana.yourdomain.com
# Optional: sudo certbot --nginx -d prometheus.yourdomain.com

Part 5 — First Login and Dashboard Setup {#part-5}

Visit https://grafana.yourdomain.com.

Login: admin / your Grafana password from .env.

Add Prometheus as a data source:

  1. Connections → Data sources → Add data source
  2. Select Prometheus
  3. URL: http://prometheus:9090
  4. Click Save & Test — should show "Data source is working"

Part 6 — Import Community Dashboards {#part-6}

Grafana has a library of pre-built dashboards. Import the Node Exporter dashboard for instant system metrics:

  1. Dashboards → New → Import
  2. Enter dashboard ID: 1860 (Node Exporter Full — the most popular)
  3. Click Load
  4. Select your Prometheus data source
  5. Click Import

You now have a comprehensive dashboard showing:

  • CPU usage by core
  • Memory and swap usage
  • Disk I/O and space
  • Network traffic
  • System load average
  • Open file descriptors

Other useful dashboard IDs:

  • 3662 — Prometheus 2.0 Overview
  • 893 — Docker Container & Host Metrics
  • 7587 — Nginx (if you have the Nginx exporter)
  • 9528 — PostgreSQL Database

Part 7 — Monitor a Second Server {#part-7}

To monitor additional servers, install Node Exporter on each:

# On the second server
docker run -d \
  --name node-exporter \
  --restart unless-stopped \
  --net=host \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /:/rootfs:ro \
  prom/node-exporter:latest \
  --path.procfs=/host/proc \
  --path.rootfs=/rootfs \
  --path.sysfs=/host/sys

# Open port 9100 for Prometheus scraping (restrict to your monitoring server IP)
sudo ufw allow from MONITORING_SERVER_IP to any port 9100

Add the server to Prometheus config on the monitoring server:

nano ~/apps/monitoring/prometheus/prometheus.yml
  - job_name: 'server-2'
    static_configs:
      - targets: ['SECOND_SERVER_IP:9100']
        labels:
          instance: 'server-2'
          job: 'node'

Reload Prometheus:

curl -X POST http://localhost:9090/-/reload

Part 8 — Configure Alerts {#part-8}

Grafana alerts (recommended for beginners)

  1. Open any panel in a dashboard
  2. Click EditAlert tab
  3. Set conditions (e.g., CPU > 90% for 5 minutes)
  4. Add notification channel (Email, Telegram, Slack, etc.)
  5. Save

Prometheus alerting rules

Create prometheus/alert_rules.yml:

groups:
  - name: server_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}%"

      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Only {{ $value }}% disk space remaining"

Add to prometheus.yml:

rule_files:
  - "alert_rules.yml"

The Gotcha: Prometheus Data Retention {#gotcha}

Prometheus stores metrics in a time-series database that grows continuously. The default retention is 15 days. With many metrics sources and short scrape intervals, this can use significant disk space.

Check current disk usage:

docker exec prometheus du -sh /prometheus

Configure retention in docker-compose.yml:

command:
  - '--storage.tsdb.retention.time=30d'    # Time-based (30 days)
  - '--storage.tsdb.retention.size=10GB'   # Size-based (10 GB max)

For long-term storage, consider Thanos or Cortex, or export important metrics to a cheaper storage solution.


Useful Prometheus Queries {#queries}

# CPU usage percentage (all cores)
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Disk usage percentage (root filesystem)
100 - ((node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100)

# Network traffic in (bytes per second)
rate(node_network_receive_bytes_total{device!="lo"}[5m])

# System load average (1 minute)
node_load1

# Uptime in days
(node_time_seconds - node_boot_time_seconds) / 86400

Troubleshooting {#troubleshooting}

Issue Likely Cause Fix
Connection refused Service not running or wrong port Check systemctl status SERVICE and verify firewall rules
Permission denied Wrong file ownership or permissions Check file ownership with ls -la and use chown/chmod to fix
502 Bad Gateway Backend service not running Restart the backend service; check logs with journalctl -u SERVICE
SSL certificate error Certificate expired or domain mismatch Run sudo certbot renew and verify domain DNS points to server IP
Service not starting Config error or missing dependency Check logs with journalctl -u SERVICE -n 50 for specific error
Out of disk space Logs or data accumulation Run df -h to identify usage; clean logs or attach CBS storage
High memory usage Too many processes or memory leak Check with htop; consider upgrading instance plan if consistently high
Firewall blocking traffic Port not open in UFW or Lighthouse console Open port in Lighthouse console firewall AND sudo ufw allow PORT

Frequently Asked Questions {#faq}

How much resource does Grafana and Prometheus use on the server?
Grafana and Prometheus is designed to be lightweight. It typically uses minimal CPU and 50–200 MB RAM. Run it on the same server as your applications without significant impact.

How do I get alerts when a service goes down?
Configure Grafana and Prometheus's notification integrations — most support email, Telegram, Slack, Discord, and webhook. Set appropriate check intervals (every 60 seconds is typical) and recovery thresholds to avoid alert fatigue from brief glitches.

Can I monitor multiple servers with one Grafana and Prometheus instance?
Yes. Add the server IPs or domains as separate monitors. For agent-based monitoring, install the agent on each server you want to track.

How do I monitor SSL certificate expiry?
Add a certificate check to your monitoring. Most monitoring tools including Grafana and Prometheus support HTTPS checks that alert when certificates are within a configurable days-to-expiry threshold.

What's the difference between uptime monitoring and performance monitoring?
Uptime monitoring checks if a service is available (up/down). Performance monitoring tracks metrics over time (CPU%, response times, database query counts). Both are complementary.

Set up your monitoring stack today:
👉 Tencent Cloud Lighthouse — Ubuntu VPS for your monitoring infrastructure
👉 View current pricing and promotions
👉 Explore all active deals and offers