You can't optimize what you can't measure. And if your Enterprise WeChat bot's performance monitoring consists of "it seems fast enough" — you're flying blind.
Performance monitoring gives you the data to answer: How fast is the bot? Where are the bottlenecks? Is it getting worse over time? Let's build a monitoring system that answers all three.
Not all metrics are created equal. Focus on these:
| Metric | What It Tells You | Target |
|---|---|---|
| Response time (p50) | Typical user experience | < 2s |
| Response time (p95) | Worst-case experience | < 5s |
| Error rate | Reliability | < 1% |
| Token throughput | Cost efficiency | Track trend |
| Message queue depth | Backlog indicator | < 10 |
| Memory usage | Resource health | < 80% |
On your Tencent Cloud Lighthouse instance, instrument the bot to emit structured metrics:
# /opt/clawdbot/config/wecom-monitoring.yaml
monitoring:
enabled: true
metrics_file: "/var/log/clawdbot/metrics.jsonl"
collect_interval: 60 # seconds
metrics:
- response_time
- tokens_used
- error_count
- memory_usage
- active_connections
- queue_depth
Each processed message generates a metrics line:
{"ts":"2026-03-06T10:23:45Z","type":"request","response_time_ms":1847,"tokens_in":150,"tokens_out":320,"model":"claude-sonnet","status":"success","user":"wx_123","skill":"general-qa"}
#!/bin/bash
# /opt/clawdbot/perf-dashboard.sh
METRICS="/var/log/clawdbot/metrics.jsonl"
TODAY=$(date +%Y-%m-%d)
echo "╔══════════════════════════════════════════╗"
echo "║ Enterprise WeChat Bot Performance ║"
echo "║ $(date) ║"
echo "╚══════════════════════════════════════════╝"
echo ""
echo "📊 Response Time"
echo " p50: $(grep "$TODAY" "$METRICS" | jq -r '.response_time_ms' | sort -n | awk 'NR==int(NR*0.5){print $1}')ms"
echo " p95: $(grep "$TODAY" "$METRICS" | jq -r '.response_time_ms' | sort -n | awk 'NR==int(NR*0.95){print $1}')ms"
echo " p99: $(grep "$TODAY" "$METRICS" | jq -r '.response_time_ms' | sort -n | awk 'NR==int(NR*0.99){print $1}')ms"
echo ""
echo "📈 Throughput"
TOTAL=$(grep -c "$TODAY" "$METRICS")
echo " Messages today: $TOTAL"
HOURS=$(date +%H)
echo " Avg per hour: $((TOTAL / (HOURS + 1)))"
echo ""
echo "❌ Errors"
ERRORS=$(grep "$TODAY" "$METRICS" | grep -c '"status":"error"')
echo " Total: $ERRORS"
echo " Rate: $(echo "scale=2; $ERRORS * 100 / $TOTAL" | bc)%"
echo ""
echo "💰 Token Usage"
TOKENS_IN=$(grep "$TODAY" "$METRICS" | jq -r '.tokens_in' | awk '{sum+=$1} END{print sum}')
TOKENS_OUT=$(grep "$TODAY" "$METRICS" | jq -r '.tokens_out' | awk '{sum+=$1} END{print sum}')
echo " Input: $TOKENS_IN"
echo " Output: $TOKENS_OUT"
echo " Total: $((TOKENS_IN + TOKENS_OUT))"
echo ""
echo "🖥️ Resources"
echo " Memory: $(ps -o rss= -p $(pgrep -f clawdbot) | awk '{print int($1/1024)}')MB"
echo " CPU: $(ps -o %cpu= -p $(pgrep -f clawdbot))%"
echo " Disk: $(df -h /opt/clawdbot | tail -1 | awk '{print $5}') used"
Trigger alerts when metrics cross thresholds:
#!/bin/bash
# /opt/clawdbot/perf-alert.sh
METRICS="/var/log/clawdbot/metrics.jsonl"
# Check p95 response time (last 50 requests)
P95=$(tail -50 "$METRICS" | jq -r '.response_time_ms' | sort -n | awk 'NR==int(NR*0.95){print $1}')
if [ "${P95:-0}" -gt 5000 ]; then
curl -X POST "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_WEBHOOK_KEY" \
-H "Content-Type: application/json" \
-d "{\"msgtype\":\"text\",\"text\":{\"content\":\"[PERF ALERT] p95 response time is ${P95}ms (threshold: 5000ms)\"}}"
fi
# Check error rate (last 100 requests)
TOTAL=$(tail -100 "$METRICS" | wc -l)
ERRORS=$(tail -100 "$METRICS" | grep -c '"status":"error"')
ERROR_RATE=$(echo "scale=0; $ERRORS * 100 / $TOTAL" | bc)
if [ "$ERROR_RATE" -gt 5 ]; then
curl -X POST "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=YOUR_WEBHOOK_KEY" \
-H "Content-Type: application/json" \
-d "{\"msgtype\":\"text\",\"text\":{\"content\":\"[PERF ALERT] Error rate is ${ERROR_RATE}% (threshold: 5%)\"}}"
fi
Schedule it:
*/5 * * * * /opt/clawdbot/perf-alert.sh
Performance monitoring requires a stable, always-on server with persistent storage for metrics data.
Track performance over time to catch degradation early:
#!/bin/bash
# /opt/clawdbot/weekly-trend.sh
echo "=== Weekly Performance Trend ==="
for i in $(seq 6 -1 0); do
DATE=$(date -d "$i days ago" +%Y-%m-%d)
AVG=$(grep "$DATE" /var/log/clawdbot/metrics.jsonl | jq -r '.response_time_ms' | awk '{sum+=$1;n++} END{print int(sum/n)}')
MSGS=$(grep -c "$DATE" /var/log/clawdbot/metrics.jsonl)
echo " $DATE: avg ${AVG:-0}ms, ${MSGS:-0} messages"
done
If average response time is creeping up week over week, it's time to investigate — before users notice.
Performance monitoring is a feedback loop: measure → analyze → optimize → measure again. Without it, you're guessing. With it, you're engineering.
You can't improve what you don't measure.