Self-Hosting Grafana and Prometheus: Build Your Own Monitoring Stack
Datadog costs $15/host/month. New Relic has a free tier until you actually need it. Cloud monitoring gets expensive fast — especially when all you want is to know when something breaks and see some nice graphs.
Grafana and Prometheus together form the most popular open source monitoring stack in the world. Prometheus collects and stores metrics. Grafana visualizes them. Both are free, self-hosted, and used in production by organizations from startups to the largest tech companies.
The Stack at a Glance
The monitoring stack has a few components, each with a clear role:
| Component | Role | Analogy |
|---|---|---|
| Prometheus | Metrics collection and storage | The database |
| Grafana | Visualization and dashboards | The UI |
| Alertmanager | Alert routing and notifications | The pager |
| Node Exporter | System metrics (CPU, RAM, disk) | The sensor |
| Exporters | App-specific metrics | More sensors |
How it works: Prometheus scrapes metrics from your servers and applications on a schedule (typically every 15-30 seconds). Grafana queries Prometheus and renders dashboards. Alertmanager fires notifications when metrics cross thresholds.
Self-Hosted vs. Paid Monitoring
| Feature | Datadog / New Relic | Grafana + Prometheus |
|---|---|---|
| Cost per host | $15-23/month | Free |
| Data retention | Varies by plan | You control it |
| Setup time | Minutes | 1-2 hours |
| Maintenance | Zero | Some (updates, storage) |
| Custom dashboards | Yes | Yes (more flexible) |
| APM / Tracing | Built-in | Separate tools needed |
| Log management | Built-in | Add Loki |
| Hosted option | Yes (it's the product) | Grafana Cloud free tier |
When paid monitoring is the better choice
- You have no one to maintain infrastructure — SaaS monitoring is fire-and-forget.
- You need APM (Application Performance Monitoring) — Distributed tracing across microservices is complex to self-host.
- Compliance requirements — Some regulations require specific audit trails that SaaS providers handle.
- You're a large team — At scale, the time cost of maintaining monitoring infrastructure may exceed SaaS fees.
When self-hosting wins
- You're cost-sensitive — Monitoring 5-10 servers with Datadog costs $75-150/month. Self-hosted costs $0.
- You want full control — Your data stays on your infrastructure.
- You need long retention — Store years of metrics without per-GB fees.
- You're already running servers — Adding Prometheus to existing infrastructure is low marginal effort.
Setting It Up
Docker Compose deployment
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=90d'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: changeme
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
pid: host
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
restart: unless-stopped
volumes:
prometheus_data:
grafana_data:
Prometheus configuration
Create prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
docker compose up -d
Prometheus is now available at http://your-server:9090, Grafana at http://your-server:3000.
Connecting Grafana to Prometheus
- Open Grafana (
http://your-server:3000, login with admin/changeme) - Go to Configuration → Data Sources → Add data source
- Select Prometheus
- Set URL to
http://prometheus:9090 - Click Save & Test
Your first dashboard
Don't build dashboards from scratch. Import a community dashboard:
- In Grafana, go to Dashboards → Import
- Enter dashboard ID 1860 (Node Exporter Full)
- Select your Prometheus data source
- Click Import
You'll immediately see CPU usage, memory, disk I/O, network traffic, and dozens of other system metrics with professional-looking graphs.
Setting Up Alerts
Monitoring without alerting is just logging with extra steps. Here's how to get notified when things go wrong.
Prometheus alerting rules
Create alert-rules.yml:
groups:
- name: system
rules:
- alert: HighCPU
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 15
for: 5m
labels:
severity: critical
annotations:
summary: "Disk space below 15% on {{ $labels.instance }}"
- alert: HighMemory
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage above 90% on {{ $labels.instance }}"
Alertmanager for notifications
Alertmanager routes alerts to your preferred notification channel: email, Slack, PagerDuty, Telegram, or webhooks.
# alertmanager.yml
route:
receiver: 'email'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'
from: '[email protected]'
smarthost: 'smtp.yourdomain.com:587'
Monitoring Beyond System Metrics
Prometheus can monitor almost anything through exporters:
- Databases: postgres_exporter, mysqld_exporter, redis_exporter
- Web servers: nginx-exporter, apache_exporter
- Containers: cAdvisor (built into Docker)
- Applications: Most modern apps expose Prometheus metrics natively
- Network: SNMP exporter, blackbox_exporter (HTTP/TCP/ICMP probes)
- Hardware: IPMI exporter for server hardware health
Each exporter exposes metrics that Prometheus scrapes automatically.
Storage and Retention
Prometheus stores data efficiently using its custom time-series database (TSDB):
- 1,000 metrics at 15s intervals uses roughly 1-2 GB per month
- Default retention: 15 days
- Recommended: 90 days for most setups (
--storage.tsdb.retention.time=90d)
For longer retention, consider Thanos or VictoriaMetrics as a long-term storage backend.
The Honest Trade-offs
Grafana + Prometheus is great if:
- You're monitoring servers, containers, or applications you control
- You want beautiful dashboards without per-host fees
- You need flexible alerting with custom thresholds
- You want to learn the industry-standard monitoring stack
It's not ideal if:
- You need zero-maintenance monitoring (SaaS is truly hands-off)
- You need distributed tracing across microservices (add Jaeger/Tempo separately)
- You only have one server and don't want to run additional services on it
Bottom line: Grafana + Prometheus is the monitoring stack that most production infrastructure runs. The setup takes an hour or two, and you get enterprise-grade monitoring for free. If you're paying more than $20/month for cloud monitoring, self-hosting will pay for itself immediately.
Resources
- Prometheus documentation
- Grafana documentation
- Awesome Prometheus alerts — pre-built alert rules
- Grafana dashboard library — thousands of community dashboards