Grafana - Visualization & Dashboards
Key Features: Unified Visualization • Multi-Datasource Queries • Dashboard Sidecar • Explore Mode • Alert Management • User Authentication • Dashboard as Code • Plugin Ecosystem
File: docs/package-monitoring-grafana.md
Purpose: Complete guide to Grafana deployment and configuration for visualization and exploration in Urbalurba infrastructure
Target Audience: DevOps engineers, platform administrators, SREs, developers, data analysts
Last Updated: October 5, 2025
Deployed Version: Grafana v12.1.1 (Helm Chart: grafana-10.0.0) Official Documentation: https://grafana.com/docs/grafana/v12.1/
📋 Overview
Grafana is the unified visualization platform for the Urbalurba observability stack. It provides a single pane of glass for querying, visualizing, and alerting on data from Prometheus (metrics), Loki (logs), and Tempo (traces). Grafana's Explore mode enables ad-hoc investigation, while dashboards provide persistent monitoring views.
As the front-end of the observability stack, Grafana enables:
- Unified Querying: Query metrics, logs, and traces from a single interface
- Dashboard Management: Auto-load dashboards from Kubernetes ConfigMaps
- Correlation: Link metrics → logs → traces for complete context
- Alerting: Define alert rules and notification channels
- Exploration: Ad-hoc queries with Explore mode
Key Capabilities:
- Pre-Configured Datasources: Prometheus (default), Loki, Tempo ready to use
- Dashboard Sidecar: Auto-loads dashboards from ConfigMaps with label
grafana_dashboard: "1" - PromQL, LogQL, TraceQL: Native query language support for all backends
- Correlation Links: Jump from metrics → logs → traces seamlessly
- Web UI Access:
http://grafana.localhostvia Traefik IngressRoute - Persistent Storage: 10Gi PVC for dashboards and configuration
Architecture Type: Web-based visualization and exploration platform
🏗️ Architecture
Deployment Components
┌──────────────────────────────────────────────────────────┐
│ Grafana Stack (namespace: monitoring) │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Grafana Deployment │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ Web UI (Port 80) │ │ │
│ │ │ - Login/Authentication │ │ │
│ │ │ - Dashboard Rendering │ │ │
│ │ │ - Explore Mode │ │ │
│ │ │ - Alert Management │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ Pre-Configured Datasources │ │ │
│ │ │ │ │ │
│ │ │ • Prometheus (default) │ │ │
│ │ │ url: prometheus-server:80 │ │ │
│ │ │ │ │ │
│ │ │ • Loki │ │ │
│ │ │ url: loki-gateway:80 │ │ │
│ │ │ │ │ │
│ │ │ • Tempo │ │ │
│ │ │ url: tempo:3200 │ │ │
│ │ └─────────────────────── ───────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ Dashboard Sidecar (Auto-Load) │ │ │
│ │ │ │ │ │
│ │ │ Watches for ConfigMaps with: │ │ │
│ │ │ label: grafana_dashboard: "1" │ │ │
│ │ │ namespace: monitoring │ │ │
│ │ │ │ │ │
│ │ │ Auto-reloads dashboards every 60s │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────┐ │ │
│ │ │ Persistent Storage (10Gi PVC) │ │ │
│ │ │ - Dashboard definitions │ │ │
│ │ │ - User preferences │ │ │
│ │ │ - Alert states │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ Access: http://grafana.localhost (Traefik Ingress) │
└──────────────────────────────────────────────────────────┘
│ ▲
│ │
▼ │
┌──────────────────────────────────────────────────────┐
│ Datasource Backends │
│ - Prometheus (metrics) │
│ - Loki (logs) │
│ - Tempo (traces) │
└──────────────────────────────────────────────────────┘
Data Flow
User Browser
│
│ HTTP (http://grafana.localhost)
▼
┌──────────────────────┐
│ Traefik Ingress │
│ (Host: grafana.*) │
└──────────────────────┘
│
│ Routes to Grafana Service
▼
┌──────────────────────────────┐
│ Grafana Web UI │
│ (Port 80) │
├──────────────────────────────┤
│ User Actions: │
│ 1. Dashboard view │
│ 2. Explore query │
│ 3. Alert configuration │
└──────────────────────────────┘
│
├─► Query Prometheus (PromQL)
├─► Query Loki (LogQL)
└─► Query Tempo (TraceQL)
│
▼
┌──────────────────┐
│ Render Results │
│ - Graphs │
│ - Tables │
│ - Logs │
│ - Traces │
└──────────────────┘
Dashboard Auto-Loading
ConfigMap Created
(label: grafana_dashboard: "1")
│
▼
┌──────────────────────────────┐
│ Dashboard Sidecar Container │
│ (watches monitoring ns) │
└──────────────────────────────┘
│
│ Detects new ConfigMap
▼
┌──────────────────────────────┐
│ Load Dashboard JSON │
│ - Parse dashboard def │
│ - Register with Grafana │
│ - Assign to folder │
└──────────────────────────────┘
│
▼
Dashboard Available in UI (~30s)
File Structure
manifests/
├── 034-grafana-config.yaml # Grafana Helm values
├── 035-grafana-dashboards.yaml # (if exists) Installation test dashboards
├── 036-grafana-sovdev-verification.yaml # sovdev-logger verification dashboard
└── 038-grafana-ingressroute.yaml # Traefik IngressRoute
ansible/playbooks/
├── 034-setup-grafana.yml # Deployment automation
└── 034-remove-grafana.yml # Removal automation
provision-host/kubernetes/11-monitoring/not-in-use/
├── 05-setup-grafana.sh # Shell script wrapper
└── 05-remove-grafana.sh # Removal script
Storage:
└── PersistentVolumeClaim
└── grafana (10Gi) # Configuration and dashboards
🚀 Deployment
Automated Deployment
Via Monitoring Stack (Recommended):
# Deploy entire monitoring stack (includes Grafana)
docker exec -it provision-host bash
cd /mnt/urbalurbadisk/provision-host/kubernetes/11-monitoring/not-in-use
./00-setup-all-monitoring.sh rancher-desktop
Individual Deployment:
# Deploy Grafana only (requires Prometheus, Loki, Tempo already deployed)
docker exec -it provision-host bash
cd /mnt/urbalurbadisk/provision-host/kubernetes/11-monitoring/not-in-use
./05-setup-grafana.sh rancher-desktop
Manual Deployment
Prerequisites:
- Kubernetes cluster running (Rancher Desktop)
monitoringnamespace exists- Datasources deployed first: Prometheus, Loki, Tempo
- Helm installed in provision-host container
- Manifest files:
034-grafana-config.yaml,038-grafana-ingressroute.yaml
Deployment Steps:
# 1. Enter provision-host container
docker exec -it provision-host bash
# 2. Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# 3. Deploy Grafana
helm upgrade --install grafana grafana/grafana \
-f /mnt/urbalurbadisk/manifests/034-grafana-config.yaml \
--namespace monitoring \
--create-namespace \
--timeout 600s \
--kube-context rancher-desktop
# 4. Deploy IngressRoute for web UI access
kubectl apply -f /mnt/urbalurbadisk/manifests/038-grafana-ingressroute.yaml
# 5. Wait for pods to be ready
kubectl wait --for=condition=ready pod \
-l app.kubernetes.io/name=grafana \
-n monitoring --timeout=300s
Deployment Time: ~2-3 minutes
⚙️ Configuration
Grafana Configuration (manifests/034-grafana-config.yaml)
Admin Credentials:
adminUser: admin
adminPassword: SecretPassword1
Pre-Configured Datasources:
datasources:
datasources.yaml:
apiVersion: 1
datasources:
# Prometheus (default datasource)
- name: Prometheus
type: prometheus
uid: prometheus
url: http://prometheus-server.monitoring.svc.cluster.local:80
access: proxy
isDefault: true
editable: true
# Loki (logs)
- name: Loki
type: loki
uid: loki
url: http://loki-gateway.monitoring.svc.cluster.local:80
access: proxy
editable: true
# Tempo (traces)
- name: Tempo
type: tempo
uid: tempo
url: http://tempo.monitoring.svc.cluster.local:3200
access: proxy
editable: true
Official Datasource Docs:
- Prometheus: https://grafana.com/docs/grafana/v12.1/datasources/prometheus/
- Loki: https://grafana.com/docs/grafana/v12.1/datasources/loki/
- Tempo: https://grafana.com/docs/grafana/v12.1/datasources/tempo/
Dashboard Sidecar (Auto-Loading):
sidecar:
dashboards:
enabled: true
label: grafana_dashboard # Watch for ConfigMaps with this label
labelValue: "1"
folder: /tmp/dashboards
searchNamespace: monitoring # Only watch monitoring namespace
folderAnnotation: grafana_folder # Optional folder organization
provider:
foldersFromFilesStructure: true
Persistent Storage:
persistence:
enabled: true
size: 10Gi
External Access Configuration (manifests/038-grafana-ingressroute.yaml)
Traefik IngressRoute:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: grafana
namespace: monitoring
spec:
entryPoints:
- web
routes:
- match: HostRegexp(`grafana\..+`) # Matches grafana.localhost, grafana.urbalurba.no, etc.
kind: Rule
services:
- name: grafana
port: 80
Access URLs:
- Localhost:
http://grafana.localhost - Future External:
http://grafana.urbalurba.no(requires DNS configuration)
Resource Configuration
Storage Requirements:
- Grafana PVC: 10Gi persistent volume
Service Endpoints:
- Web UI:
grafana.monitoring.svc.cluster.local:80 - External UI:
http://grafana.localhost(via Traefik)
Security Configuration
Authentication:
- Default Credentials:
admin/SecretPassword1 - Configuration Source: Defined in
manifests/034-grafana-config.yaml(lines 31-32)adminUser: admin
adminPassword: SecretPassword1 - Not Hardcoded: The password is set via Helm values file, not hardcoded in the Grafana chart
- Customization: Change password by editing
034-grafana-config.yamland runninghelm upgrade - Production Recommendation: Change default password for production deployments
- Future Enhancement: Authentik SSO integration (optional)
Network Access:
- Internal: ClusterIP service for internal cluster access
- External: Traefik IngressRoute at
grafana.localhost(HTTP, port 80)
🔍 Monitoring & Verification
Health Checks
Check Pod Status:
# Grafana pods (main + sidecar containers)
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana
# Expected output:
NAME READY STATUS RESTARTS AGE
grafana-xxx 2/2 Running 0 5m
Check Service Endpoints:
# Verify service is accessible
kubectl get svc -n monitoring -l app.kubernetes.io/name=grafana
# Expected service:
grafana ClusterIP 10.43.x.x 80/TCP
Service Verification
Test Web UI Access:
# Via Traefik IngressRoute
curl -H "Host: grafana.localhost" http://127.0.0.1/
# Expected: HTML response with Grafana login page
Test Datasource Connectivity (from within pod):
# Test Prometheus datasource
kubectl exec -n monitoring deployment/grafana -- \
curl -s http://prometheus-server.monitoring.svc.cluster.local:80/api/v1/status/config
# Test Loki datasource
kubectl exec -n monitoring deployment/grafana -- \
curl -s http://loki-gateway.monitoring.svc.cluster.local:80/ready
# Test Tempo datasource
kubectl exec -n monitoring deployment/grafana -- \
curl -s http://tempo.monitoring.svc.cluster.local:3200/ready
Verify Datasources in UI
- Open
http://grafana.localhost - Login:
admin/SecretPassword1 - Navigate to Configuration → Data sources
- Verify all three datasources are listed:
- ✅ Prometheus (default)
- ✅ Loki
- ✅ Tempo
Automated Verification
The deployment playbook (034-setup-grafana.yml) performs automated tests:
- ✅ Web UI accessibility
- ✅ Datasource configuration verification
- ✅ Dashboard sidecar functionality
- ✅ Test data generation and visualization (Installation Test Suite)
Installation Test Suite Dashboards
Grafana automatically deploys 3 validation dashboards organized in the "Installation Test Suite" folder. These dashboards verify end-to-end functionality of the monitoring stack by displaying test telemetry generated during setup.
Purpose: Validate that logs, traces, and metrics flow correctly from OTLP Collector → Loki/Tempo/Prometheus → Grafana
Dashboards Deployed (manifests/035-grafana-test-dashboards.yaml):
1. Test Data - Logs
UID: test-data-logs
Query: {service_name="telemetrygen-logs"}
Expected Data: 100+ log entries from telemetrygen tool
What This Validates:
- ✅ OTLP Collector receives logs via HTTP
- ✅ Logs are exported from OTLP Collector to Loki
- ✅ Loki indexes logs by
service_namelabel - ✅ Grafana can query Loki datasource via LogQL
- ✅ Log panel displays structured log entries
How Test Data is Generated (during Grafana setup):
# Ansible playbook runs this command (step 23):
kubectl run telemetrygen-dashboard-logs \
--image=ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest \
--rm -i --restart=Never -n monitoring -- \
logs --otlp-endpoint=otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318 \
--otlp-insecure --otlp-http --duration=10s --logs=100 \
--service telemetrygen-logs \
--body "Test log entry for Installation Test Suite dashboard"
How to Use:
- Open
http://grafana.localhost - Login:
admin/SecretPassword1 - Navigate to Dashboards → Installation Test Suite → Test Data - Logs
- Verify panel shows 100+ log entries
- Expand log entries to see structured fields (timestamp, service_name, body)
Troubleshooting:
- No logs displayed: Check OTLP Collector logs for ingestion errors
- "No data" message: Query Loki directly:
kubectl exec -n monitoring loki-0 -c loki -- wget -q -O - 'http://localhost:3100/loki/api/v1/label/service_name/values' - Old data only: Generate fresh test data (see command above)
2. Test Data - Traces
UID: test-data-traces
Query: {resource.service.name="telemetrygen-traces"}
Expected Data: 20+ trace entries from telemetrygen tool
What This Validates:
- ✅ OTLP Collector receives traces via gRPC
- ✅ Traces are exported from OTLP Collector to Tempo
- ✅ Tempo stores trace data with resource attributes
- ✅ Grafana can query Tempo datasource via TraceQL
- ✅ Trace count stat panel shows total traces
- ✅ Trace table displays trace IDs for inspection
How Test Data is Generated (during Grafana setup):
# Ansible playbook runs this command (step 24):
kubectl run telemetrygen-dashboard-traces \
--image=ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest \
--rm -i --restart=Never -n monitoring -- \
traces --otlp-endpoint=otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317 \
--otlp-insecure --duration=5s --traces=20 \
--service telemetrygen-traces
How to Use:
- Navigate to Dashboards → Installation Test Suite → Test Data - Traces
- Verify Trace Count stat panel shows 20+ traces (green background = success)
- View All Test Traces table with trace IDs
- Click on a trace ID to open trace waterfall view (spans visualization)
- Inspect trace spans, duration, and resource attributes
Troubleshooting:
- Trace count shows 0: Check Tempo ingestion:
kubectl logs -n monitoring tempo-0 | grep telemetrygen - Table empty: Query Tempo API directly:
kubectl exec -n monitoring tempo-0 -- wget -q -O - 'http://localhost:3200/api/search?tags=service.name%3Dtelemetrygen-traces' - Old traces only: Generate fresh test data (see command above)
3. Test Data - Metrics
UID: test-data-metrics
Query: up (Prometheus 'up' metric for all scraped targets)
Expected Data: Timeseries graph showing health of all monitored services
What This Validates:
- ✅ Prometheus scrapes metrics from all targets
- ✅ Prometheus stores time-series data
- ✅ Grafana can query Prometheus datasource via PromQL
- ✅ Timeseries panel displays multiple metrics with legend
- ✅ Monitoring stack services are healthy (value = 1)
How Test Data is Available:
- No generation needed: Prometheus automatically scrapes
upmetric from all targets (Prometheus server, alertmanager, node-exporter, kube-state-metrics, pushgateway, OTLP Collector, Loki, Tempo, Grafana) - Metric value:
1= service is up and responding to scrapes,0= service is down
How to Use:
- Navigate to Dashboards → Installation Test Suite → Test Data - Metrics
- View timeseries graph showing multiple services
- Check legend on right: All services should show
1(up) in "Last" column - Hover over graph lines to see individual service metrics
- Verify services like
prometheus-server,loki,tempoare present
Troubleshooting:
- No metrics displayed: Check Prometheus targets:
kubectl port-forward -n monitoring svc/prometheus-server 9090:80→ Openhttp://localhost:9090/targets - Services showing 0: Check pod health:
kubectl get pods -n monitoring - Missing services in legend: Verify Prometheus ServiceMonitor configuration
Access Installation Test Suite:
# Open Grafana
open http://grafana.localhost
# Navigate to folder
Dashboards → Browse → Installation Test Suite folder
Dashboard Files:
- ConfigMap Manifest:
manifests/035-grafana-test-dashboards.yaml - 3 ConfigMaps:
grafana-dashboard-test-logs,grafana-dashboard-test-traces,grafana-dashboard-test-metrics - Folder Label:
grafana_folder: "Installation Test Suite"
Dashboard Auto-Loading:
- Dashboards are automatically loaded via Grafana sidecar (~30-60 seconds after deployment)
- No manual import required
- Changes to ConfigMaps automatically reload in Grafana
Regenerate Test Data (if dashboards show no data):
# Generate logs
kubectl run telemetrygen-logs-manual \
--image=ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest \
--rm -i --restart=Never -n monitoring -- \
logs --otlp-endpoint=otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318 \
--otlp-insecure --otlp-http --duration=10s --logs=100 \
--service telemetrygen-logs \
--body "Manual test log entry"
# Generate traces
kubectl run telemetrygen-traces-manual \
--image=ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest \
--rm -i --restart=Never -n monitoring -- \
traces --otlp-endpoint=otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4317 \
--otlp-insecure --duration=5s --traces=20 \
--service telemetrygen-traces
sovdev-logger Dashboards
Grafana includes two pre-built dashboards for monitoring applications using sovdev-logger, which provides zero-effort observability through automatic logs, metrics, and traces generation.
Dashboards Deployed:
- Fast Metrics Dashboard (
manifests/037-grafana-sovdev-metrics.yaml) - Verification Dashboard (
manifests/036-grafana-sovdev-verification.yaml)
Fast Metrics Dashboard
UID: sovdev-metrics
Purpose: Real-time application monitoring using Prometheus metrics (sub-second query performance)
Data Source: Prometheus Queries: Uses automatic metrics generated by sovdev-logger:
sovdev_operations_total- Total operations countersovdev_errors_total- Error counter (ERROR/FATAL levels)sovdev_operation_duration_milliseconds- Duration histogramsovdev_operations_active- Active operations gauge
What This Dashboard Shows:
- ✅ Operations Rate: Requests per second by service and log type
- ✅ Error Rate: Errors per second by service and log type
- ✅ Operation Duration: P50, P95, P99 latency percentiles
- ✅ Active Operations: Currently in-progress operations
- ✅ Service Dependency Graph: Automatically generated from traces (via Tempo metrics generator)
Dashboard Variables:
service_name- Filter by specific servicelog_type- Filter by log type (API, DATABASE, BATCH, etc.)peer_service- Filter by downstream service
Benefits:
- Sub-second queries: Prometheus metrics enable fast dashboard load times
- Real-time monitoring: Track live application behavior
- No code changes: Metrics automatically generated from sovdevLog() calls
- Full dimensional filtering: service_name, peer_service, log_level, log_type
How to Use:
- Open
http://grafana.localhost - Navigate to Dashboards → sovdev-logger → Fast Metrics Dashboard
- Select service from
service_namedropdown - View operation rates, errors, latencies, and service graphs
- Click on panels to drill down into specific time ranges
Requirements:
- sovdev-logger in application
OTEL_EXPORTER_OTLP_METRICS_ENDPOINTconfigured- Tempo metrics generator enabled (for service graphs)
Verification Dashboard
UID: sovdev-verification
Purpose: Debug and verify complete observability correlation (logs + metrics + traces)
Data Sources: Loki (logs), Tempo (traces), Prometheus (metrics) Purpose: Verify traceId correlation and debug specific application executions
What This Dashboard Shows:
- ✅ Log Entries: Structured logs from Loki with all attributes
- ✅ Trace Correlation: Click traceId in logs to jump to trace waterfall
- ✅ Session Filtering: Filter by session.id to isolate specific runs
- ✅ Full Context: Input/response JSON, function names, log levels
- ✅ Error Details: Exception stack traces and error messages
Dashboard Variables:
service_name- Filter by specific servicesession_id- Filter by specific execution (unique per run)log_level- Filter by log level (ERROR, WARN, INFO, DEBUG)
Benefits:
- Full correlation: Link logs → traces → metrics via traceId
- Session isolation: Debug specific runs without time-based filtering
- Complete context: See input/response data alongside logs and traces
- Error investigation: Jump from error log to full trace waterfall
How to Use:
- Navigate to Dashboards → sovdev-logger → Verification Dashboard
- Option A - Debug specific session:
- Copy session ID from application startup:
🔑 Session ID: abc123-def456-ghi789 - Enter in
session_idvariable - View all logs/metrics/traces from that execution
- Copy session ID from application startup:
- Option B - Investigate errors:
- Set
log_levelto "ERROR" - View error logs with stack traces
- Click traceId to see full request trace
- Set
- Option C - Analyze specific service:
- Select
service_name - View chronological log stream
- Expand log entries to see full JSON context
- Select
Requirements:
- sovdev-logger in application
OTEL_EXPORTER_OTLP_LOGS_ENDPOINTconfiguredOTEL_EXPORTER_OTLP_TRACES_ENDPOINTconfigured- OTEL Collector session_id processing enabled
Access sovdev-logger Dashboards:
# Open Grafana
open http://grafana.localhost
# Navigate to dashboards
Dashboards → Browse → sovdev-logger folder
Dashboard Files:
- Fast Metrics Dashboard:
manifests/037-grafana-sovdev-metrics.yaml - Verification Dashboard:
manifests/036-grafana-sovdev-verification.yaml
Auto-Loading: Both dashboards are automatically loaded via Grafana sidecar (~30-60 seconds after deployment)
Official sovdev-logger Documentation: See docs/package-monitoring-sovdev-logger.md for library usage and features.
🛠️ Management Operations
Access Grafana UI
Open in Browser:
# Direct access (Mac host)
open http://grafana.localhost
# Or manually navigate to:
http://grafana.localhost
Login Credentials:
- Username:
admin - Password:
SecretPassword1
Dashboard Management
Dashboards are managed as Kubernetes ConfigMaps with automatic loading via the Grafana sidecar container. This GitOps-style approach enables version-controlled dashboard definitions.
Add New Dashboard
Method 1: Design in Grafana UI, Export, Convert to ConfigMap
-
Design dashboard in Grafana UI:
open http://grafana.localhost
# Login: admin/SecretPassword1
# Create dashboard → Add panels → Configure queries → Save -
Export dashboard JSON:
- Open dashboard → Settings (gear icon) → JSON Model
- Copy entire JSON content
-
Create ConfigMap manifest (
manifests/0XX-grafana-my-dashboard.yaml):apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-my-service
namespace: monitoring
labels:
grafana_dashboard: "1" # Required for auto-loading
data:
my-service.json: |
{
"title": "My Service Metrics",
"uid": "my-service-metrics",
"panels": [
{
"type": "graph",
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total{service=\"my-service\"}[5m])",
"refId": "A"
}
]
}
]
} -
Apply ConfigMap:
kubectl apply -f manifests/0XX-grafana-my-dashboard.yaml -
Verify dashboard auto-loads (~30-60 seconds):
- Check sidecar logs:
kubectl logs -n monitoring deployment/grafana -c grafana-sc-dashboard - Grafana UI → Dashboards → Search for "My Service Metrics"
- Check sidecar logs:
Method 2: Write JSON Directly (for simple dashboards):
# Create ConfigMap with inline JSON
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-simple
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
simple.json: |
{
"title": "Simple Dashboard",
"uid": "simple-dashboard",
"panels": []
}
EOF
Update Existing Dashboard
-
Edit ConfigMap manifest (
manifests/0XX-grafana-my-dashboard.yaml):vim manifests/0XX-grafana-my-dashboard.yaml
# Modify JSON in data.my-service.json -
Apply updated ConfigMap:
kubectl apply -f manifests/0XX-grafana-my-dashboard.yaml -
Wait for automatic reload (~30-60 seconds) or force reload:
kubectl rollout restart deployment/grafana -n monitoring -
Verify changes in Grafana UI (may need to refresh browser)
Alternative: Update via kubectl edit:
kubectl edit configmap -n monitoring grafana-dashboard-my-service
# Edit JSON directly in editor
# Save → Auto-reloads in ~60s
Delete Dashboard
Option 1: Remove ConfigMap (recommended for GitOps):
# Delete manifest file
kubectl delete -f manifests/0XX-grafana-my-dashboard.yaml
# Or delete directly by name
kubectl delete configmap -n monitoring grafana-dashboard-my-service
Dashboard automatically disappears from Grafana UI within ~60 seconds.
Option 2: Delete via Grafana UI (not persistent):
- Grafana UI → Dashboards → Find dashboard → Settings → Delete
- ⚠️ Dashboard will reappear if ConfigMap still exists (sidecar will reload it)
Dashboard Organization
Folder Assignment (via annotation):
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-app
namespace: monitoring
labels:
grafana_dashboard: "1"
annotations:
grafana_folder: "Application Monitoring" # Assigns to folder in UI
data:
app.json: |
{ ... }
Naming Convention:
- ConfigMap name:
grafana-dashboard-<purpose> - Dashboard JSON key:
<descriptive-name>.json - Manifest file:
manifests/0XX-grafana-<purpose>.yaml(use numbering 035-039)
Examples
Existing Dashboards:
manifests/035-grafana-test-dashboards.yaml- Installation Test Suite (3 dashboards):- Test Data - Logs: Validates OTLP → Loki → Grafana flow
- Test Data - Traces: Validates OTLP → Tempo → Grafana flow
- Test Data - Metrics: Validates Prometheus → Grafana flow
- See "Installation Test Suite Dashboards" section above for details
manifests/036-grafana-sovdev-verification.yaml- sovdev-logger Verification Dashboard:- Debug logs/traces/metrics correlation
- Session filtering for specific executions
- TraceId links to full trace waterfall
- See "sovdev-logger Dashboards" section above for details
manifests/037-grafana-sovdev-metrics.yaml- sovdev-logger Fast Metrics Dashboard:- Real-time Prometheus metrics from sovdev-logger
- Operation rates, error rates, latencies
- Service dependency graphs
- See "sovdev-logger Dashboards" section above for details
Official Dashboard Docs: https://grafana.com/docs/grafana/v12.1/dashboards/
Troubleshooting Dashboard Management
Dashboard not appearing:
# 1. Verify ConfigMap exists with correct label
kubectl get configmap -n monitoring -l grafana_dashboard=1
# 2. Check sidecar logs for errors
kubectl logs -n monitoring deployment/grafana -c grafana-sc-dashboard --tail=50
# 3. Force reload
kubectl rollout restart deployment/grafana -n monitoring
Dashboard shows old version:
# Refresh sidecar (faster than full restart)
kubectl delete pod -n monitoring -l app.kubernetes.io/name=grafana
# Or clear browser cache and refresh
Explore Mode Usage
Query Logs in Loki:
- Navigate to Explore → Select Loki datasource
- Enter LogQL query:
{service_name="sovdev-test-company-lookup-typescript"} - Run query to view log stream
Query Metrics in Prometheus:
- Navigate to Explore → Select Prometheus datasource
- Enter PromQL query:
rate(prometheus_http_requests_total[5m]) - Run query to view metrics graph
Query Traces in Tempo:
- Navigate to Explore → Select Tempo datasource
- Enter TraceQL query:
{resource.service.name="my-app"} - View trace waterfall/flamegraph
Official Explore Docs: https://grafana.com/docs/grafana/v12.1/explore/
Correlation Workflow
Metrics → Logs → Traces:
- Find metric spike in Prometheus dashboard
- Note timestamp and service name
- Switch to Loki, query logs for that time range
- Find
trace_idin log entry - Switch to Tempo, query by
trace_id - View complete request flow with logs and trace spans
Service Removal
Automated Removal:
docker exec -it provision-host bash
cd /mnt/urbalurbadisk/provision-host/kubernetes/11-monitoring/not-in-use
./05-remove-grafana.sh rancher-desktop
Manual Removal:
# Remove Helm chart
helm uninstall grafana -n monitoring --kube-context rancher-desktop
# Remove IngressRoute
kubectl delete ingressroute -n monitoring grafana
# Remove PVC (optional - preserves data if omitted)
kubectl delete pvc -n monitoring -l app.kubernetes.io/name=grafana
🔧 Troubleshooting
Common Issues
Cannot Access Web UI:
# 1. Check IngressRoute exists
kubectl get ingressroute -n monitoring grafana
# 2. Test with Host header
curl -v -H "Host: grafana.localhost" http://127.0.0.1/
# 3. Check Traefik logs
kubectl logs -n traefik -l app.kubernetes.io/name=traefik | grep grafana
# 4. Verify Grafana pod is running
kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana
Datasource Connection Errors:
# Test datasource connectivity from Grafana pod
kubectl exec -n monitoring deployment/grafana -- \
curl -v http://prometheus-server.monitoring.svc.cluster.local:80/api/v1/status/config
# Check if backend services are running
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
Dashboard Not Auto-Loading:
# 1. Verify ConfigMap has correct label
kubectl get configmap -n monitoring -l grafana_dashboard=1
# 2. Check sidecar logs
kubectl logs -n monitoring deployment/grafana -c grafana-sc-dashboard
# 3. Verify ConfigMap is in correct namespace
kubectl get configmap -n monitoring my-dashboard
# 4. Force reload by restarting Grafana
kubectl rollout restart deployment/grafana -n monitoring
Login Issues:
# Reset admin password (if forgotten)
kubectl exec -n monitoring deployment/grafana -- \
grafana-cli admin reset-admin-password NewPassword123
# Check Grafana logs for authentication errors
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana -c grafana
📋 Maintenance
Update Grafana:
# Update Helm chart to latest version
helm repo update
helm upgrade grafana grafana/grafana \
-f /mnt/urbalurbadisk/manifests/034-grafana-config.yaml \
-n monitoring \
--kube-context rancher-desktop
Backup Dashboards:
# Export all dashboards via API
kubectl port-forward -n monitoring svc/grafana 3000:80
# Use Grafana API to export (from Mac host)
curl -u admin:SecretPassword1 \
http://localhost:3000/api/search?type=dash-db | \
jq -r '.[].uid' | \
xargs -I {} curl -u admin:SecretPassword1 \
http://localhost:3000/api/dashboards/uid/{} \
> dashboard-{}.json
Backup PVC Data:
# Export Grafana configuration
kubectl exec -n monitoring deployment/grafana -- \
tar czf /tmp/grafana-backup.tar.gz /var/lib/grafana
# Copy to local machine
kubectl cp monitoring/grafana-xxx:/tmp/grafana-backup.tar.gz \
./grafana-backup.tar.gz -c grafana
🚀 Use Cases
1. Create Custom Dashboard
Using Grafana UI:
- Navigate to Dashboards → New → New Dashboard
- Add panel with Prometheus query:
rate(prometheus_http_requests_total[5m]) - Save dashboard
- Export JSON: Dashboard settings → JSON Model → Copy JSON
- Create ConfigMap with exported JSON
- Apply ConfigMap for auto-loading
2. Log Analysis Workflow
Find Errors in Logs:
- Explore → Loki
- Query:
{service_name="my-app"} |= "error" - Filter time range to last 15 minutes
- Expand log entries to view full context
- Copy
trace_idfor correlation
3. Performance Monitoring
Dashboard for Service Health:
- Panel 1: Request rate (PromQL)
rate(http_requests_total{service="my-app"}[5m]) - Panel 2: Error rate (PromQL)
rate(http_requests_total{service="my-app",status=~"5.."}[5m]) - Panel 3: Recent logs (LogQL)
{service_name="my-app"} - Panel 4: Slow traces (TraceQL)
{resource.service.name="my-app" && duration > 1s}
4. Alert Configuration
Create Alert Rule (in dashboard panel):
- Edit panel → Alert tab
- Define condition:
WHEN avg() OF query(A, 5m, now) IS ABOVE 100 - Set notification channel
- Test alert
- Save dashboard
Official Alerting Docs: https://grafana.com/docs/grafana/v12.1/alerting/
💡 Key Insight: Grafana serves as the unified interface for the entire observability stack, transforming raw telemetry data into actionable insights. Its dashboard sidecar pattern enables GitOps-style dashboard management via ConfigMaps, while Explore mode provides ad-hoc investigation capabilities. By correlating metrics, logs, and traces from Prometheus, Loki, and Tempo in a single interface, Grafana delivers complete observability visibility without context switching between tools.
🔗 Related Documentation
Monitoring Stack:
- Monitoring Overview - Complete observability stack
- Prometheus Metrics - Metrics datasource
- Loki Logs - Logs datasource
- Tempo Tracing - Traces datasource
- OTLP Collector - Telemetry ingestion
Configuration & Rules:
- Traefik IngressRoute - External access patterns
- Naming Conventions - Manifest numbering (034, 038)
- Development Workflow - Configuration management
- Secrets Management - Managing admin credentials
External Resources:
- Grafana Dashboards: https://grafana.com/docs/grafana/v12.1/dashboards/
- Grafana Explore: https://grafana.com/docs/grafana/v12.1/explore/
- Prometheus Datasource: https://grafana.com/docs/grafana/v12.1/datasources/prometheus/
- Loki Datasource: https://grafana.com/docs/grafana/v12.1/datasources/loki/
- Tempo Datasource: https://grafana.com/docs/grafana/v12.1/datasources/tempo/
- Alerting: https://grafana.com/docs/grafana/v12.1/alerting/