Advanced Monitoring and Observability for Consul Connect Service Mesh in Kubernetes

Martin Buchleitner | 09.10.2025 cloud native, devops, hashicorp

Advanced Monitoring and Observability for Consul Connect Service Mesh in Kubernetes

In our previous post on Secure Communication in Kubernetes with Consul Connect and Vault Agent Injector, we established a robust service mesh foundation. Now, let's take it to the next level by implementing comprehensive monitoring and observability for your Consul Connect service mesh in Kubernetes.

Monitoring a service mesh is crucial for understanding service performance, identifying bottlenecks, troubleshooting issues, and ensuring overall system health. In this blog post, we'll explore how to implement a complete observability stack including metrics collection with Prometheus, visualization with Grafana, and distributed tracing with Jaeger for your Consul Connect-enabled Kubernetes applications.

Prerequisites: This article builds upon our previous Consul Connect and Vault Agent Injector setup. Ensure you have completed the setup from Secure Communication in Kubernetes with Consul Connect and Vault Agent Injector before proceeding.

The Three Pillars of Observability in Service Mesh

1. Metrics (Prometheus + Grafana)

Service-level metrics: Request rates, response times, error rates
Infrastructure metrics: CPU, memory, network usage
Proxy metrics: Envoy sidecar performance and connection statistics
Consul metrics: Service mesh health and configuration changes

2. Logs (Centralized Logging)

Application logs: Business logic and application-specific events
Proxy logs: Access logs, error logs from Envoy sidecars
Consul Connect logs: Service registration, certificate rotation, intentions

3. Traces (Distributed Tracing)

Request flow: Track requests across multiple services
Performance bottlenecks: Identify slow services and operations
Error propagation: Understand how errors cascade through the mesh

Architecture Overview

Our observability stack integrates seamlessly with Consul Connect to provide comprehensive monitoring, metrics collection, and distributed tracing across your service mesh:

Consul Connect Observability Architecture

The architecture consists of four key components that work together to provide complete observability:

Applications with Envoy Sidecars: Your services run alongside Envoy proxies that handle service mesh communication and emit detailed metrics and traces
Consul Connect Mesh: Provides service discovery, certificate management, and coordinates the service mesh infrastructure
Prometheus: Collects and stores metrics from both applications and Envoy sidecars, with automatic service discovery through Consul
Jaeger: Captures distributed traces to track requests as they flow through multiple services
Grafana: Creates unified dashboards that combine metrics from Prometheus and traces from Jaeger for comprehensive visibility

Step 1: Configure Consul Connect for Observability

First, let's enhance our existing Consul Connect setup to enable comprehensive metrics collection.

Enhanced Consul Values Configuration

Update your consul-values.yaml to enable metrics and tracing:

 1# consul-values-observability.yaml
 2global:
 3  name: consul
 4  datacenter: dc1
 5  consulAPITimeout: 5s
 6  # Enable metrics collection
 7  metrics:
 8    enabled: true
 9    enableAgentMetrics: true
10    agentMetricsRetentionTime: "1m"
11
12server:
13  enabled: false
14
15client:
16  enabled: false
17
18connectInject:
19  enabled: true
20  default: false
21  consulNode:
22    meta:
23      pod-name: ${HOSTNAME}
24      node-name: ${NODE_NAME}
25  k8sAllowNamespaces: ["*"]
26  k8sDenyNamespaces: []
27
28  # Enable metrics and tracing for Connect proxies
29  metrics:
30    defaultEnabled: true
31    defaultEnableMerging: true
32    enableGatewayMetrics: true
33
34  # Configure Envoy proxy settings for observability
35  envoyExtraArgs: |
36    --log-level info
37    --component-log-level upstream:debug,connection:info
38
39  # Enable tracing
40  centralConfig:
41    enabled: true
42    defaultProtocol: "http"
43    proxyDefaults: |
44      {
45        "config": {
46          "envoy_tracing_json": {
47            "http": {
48              "name": "envoy.tracers.zipkin",
49              "typedConfig": {
50                "@type": "type.googleapis.com/envoy.extensions.tracers.zipkin.v3.ZipkinConfig",
51                "collector_cluster": "jaeger-collector",
52                "collector_endpoint_version": "HTTP_JSON",
53                "collector_endpoint": "/api/v1/spans",
54                "shared_span_context": false
55              }
56            }
57          }
58        }
59      }
60
61controller:
62  enabled: true
63
64ui:
65  enabled: false
66
67dns:
68  enabled: false
69
70externalServers:
71  enabled: true
72  hosts: ["consul.example.com"]
73  httpsPort: 8501
74  useSystemRoots: true
75
76# Enable Prometheus metrics scraping
77prometheus:
78  enabled: true

Upgrade Consul Connect with Observability

1# Upgrade Consul Connect with observability features
2helm upgrade consul hashicorp/consul \
3  --namespace consul \
4  --values consul-values-observability.yaml
5
6# Verify the upgrade
7kubectl rollout status deployment/consul-connect-injector -n consul

Step 2: Deploy Prometheus Stack

Deploy Prometheus with the official kube-prometheus-stack for comprehensive Kubernetes and service mesh monitoring.

  1# Add Prometheus community Helm repository
  2helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
  3helm repo update
  4
  5# Create monitoring namespace
  6kubectl create namespace monitoring
  7
  8# Create Prometheus values for Consul Connect integration
  9cat > prometheus-values.yaml <<EOF
 10# Prometheus configuration for Consul Connect monitoring
 11prometheus:
 12  prometheusSpec:
 13    # Enable service mesh scraping
 14    additionalScrapeConfigs:
 15    - job_name: 'consul-connect-envoy-sidecar'
 16      kubernetes_sd_configs:
 17      - role: pod
 18        namespaces:
 19          names: ['demo', 'default']
 20      relabel_configs:
 21      - source_labels: [__meta_kubernetes_pod_container_name]
 22        regex: consul-connect-envoy-sidecar
 23        action: keep
 24      - source_labels: [__meta_kubernetes_pod_annotation_consul_hashicorp_com_connect_inject]
 25        regex: 'true'
 26        action: keep
 27      - source_labels: [__address__]
 28        regex: '([^:]+):.*'
 29        target_label: __address__
 30        replacement: '${1}:19000'  # Envoy admin port
 31      - source_labels: [__meta_kubernetes_pod_name]
 32        target_label: instance
 33      - source_labels: [__meta_kubernetes_namespace]
 34        target_label: namespace
 35      metrics_path: '/stats/prometheus'
 36
 37    - job_name: 'consul-external-servers'
 38      static_configs:
 39      - targets: ['consul.example.com:8500']
 40      metrics_path: '/v1/agent/metrics'
 41      params:
 42        format: ['prometheus']
 43
 44    # Storage configuration
 45    storageSpec:
 46      volumeClaimTemplate:
 47        spec:
 48          storageClassName: standard
 49          accessModes: ["ReadWriteOnce"]
 50          resources:
 51            requests:
 52              storage: 50Gi
 53
 54    # Resource configuration
 55    resources:
 56      requests:
 57        cpu: 500m
 58        memory: 2Gi
 59      limits:
 60        cpu: 1000m
 61        memory: 4Gi
 62
 63# Grafana configuration
 64grafana:
 65  enabled: true
 66  adminPassword: "admin123"  # Change in production
 67  persistence:
 68    enabled: true
 69    size: 10Gi
 70  resources:
 71    requests:
 72      cpu: 250m
 73      memory: 512Mi
 74    limits:
 75      cpu: 500m
 76      memory: 1Gi
 77
 78  # Pre-configure Consul Connect dashboards
 79  dashboardProviders:
 80    dashboardproviders.yaml:
 81      apiVersion: 1
 82      providers:
 83      - name: 'consul-connect'
 84        orgId: 1
 85        folder: 'Consul Connect'
 86        type: file
 87        disableDeletion: false
 88        editable: true
 89        options:
 90          path: /var/lib/grafana/dashboards/consul-connect
 91
 92  dashboardsConfigMaps:
 93    consul-connect: consul-connect-dashboards
 94
 95# Node exporter for infrastructure metrics
 96nodeExporter:
 97  enabled: true
 98
 99# Alert manager for notifications
100alertmanager:
101  enabled: true
102  alertmanagerSpec:
103    resources:
104      requests:
105        cpu: 100m
106        memory: 128Mi
107      limits:
108        cpu: 200m
109        memory: 256Mi
110EOF
111
112# Install Prometheus stack
113helm install prometheus-stack prometheus-community/kube-prometheus-stack \
114  --namespace monitoring \
115  --values prometheus-values.yaml
116
117# Wait for deployment
118kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=prometheus -n monitoring --timeout=300s

Step 3: Deploy Jaeger for Distributed Tracing

Deploy Jaeger to capture and analyze distributed traces from your service mesh.

 1# Add Jaeger Helm repository
 2helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
 3helm repo update
 4
 5# Create Jaeger values for Consul Connect integration
 6cat > jaeger-values.yaml <<EOF
 7# Jaeger configuration for Consul Connect tracing
 8provisionDataStore:
 9  cassandra: false
10  elasticsearch: true
11
12storage:
13  type: elasticsearch
14  elasticsearch:
15    host: jaeger-elasticsearch-master
16    port: 9200
17
18agent:
19  enabled: true
20  daemonset:
21    useHostNetwork: true
22
23collector:
24  enabled: true
25  replicaCount: 2
26  service:
27    type: ClusterIP
28    # Expose Zipkin compatible endpoint for Envoy
29    zipkin:
30      port: 9411
31  resources:
32    requests:
33      cpu: 250m
34      memory: 512Mi
35    limits:
36      cpu: 500m
37      memory: 1Gi
38
39query:
40  enabled: true
41  replicaCount: 1
42  service:
43    type: ClusterIP
44  resources:
45    requests:
46      cpu: 250m
47      memory: 512Mi
48    limits:
49      cpu: 500m
50      memory: 1Gi
51
52# Configure ingress for Jaeger UI
53ingress:
54  enabled: true
55  hosts:
56  - jaeger.local
57  annotations:
58    nginx.ingress.kubernetes.io/rewrite-target: /
59
60# Elasticsearch for trace storage
61elasticsearch:
62  enabled: true
63  replicas: 1
64  minimumMasterNodes: 1
65  resources:
66    requests:
67      cpu: 500m
68      memory: 2Gi
69    limits:
70      cpu: 1000m
71      memory: 4Gi
72  volumeClaimTemplate:
73    storageClassName: standard
74    accessModes: [ "ReadWriteOnce" ]
75    resources:
76      requests:
77        storage: 30Gi
78EOF
79
80# Install Jaeger
81helm install jaeger jaegertracing/jaeger \
82  --namespace monitoring \
83  --values jaeger-values.yaml
84
85# Wait for Jaeger components
86kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=jaeger -n monitoring --timeout=300s

Step 4: Create Grafana Dashboards

Create comprehensive Grafana dashboards for Consul Connect monitoring.

  1# Create ConfigMap with Consul Connect dashboards
  2kubectl create configmap consul-connect-dashboards -n monitoring --from-literal=consul-connect-overview.json='
  3{
  4  "dashboard": {
  5    "id": null,
  6    "title": "Consul Connect Service Mesh Overview",
  7    "tags": ["consul", "service-mesh"],
  8    "timezone": "browser",
  9    "panels": [
 10      {
 11        "title": "Service Mesh Request Rate",
 12        "type": "graph",
 13        "targets": [
 14          {
 15            "expr": "sum(rate(envoy_http_downstream_rq_total[5m])) by (service)",
 16            "legendFormat": "{{service}}"
 17          }
 18        ],
 19        "yAxes": [
 20          {
 21            "label": "Requests/sec"
 22          }
 23        ]
 24      },
 25      {
 26        "title": "Service Mesh Error Rate",
 27        "type": "graph",
 28        "targets": [
 29          {
 30            "expr": "sum(rate(envoy_http_downstream_rq_xx{envoy_response_code_class=\"5\"}[5m])) by (service) / sum(rate(envoy_http_downstream_rq_total[5m])) by (service)",
 31            "legendFormat": "{{service}} 5xx errors"
 32          }
 33        ]
 34      },
 35      {
 36        "title": "Service Mesh Response Times",
 37        "type": "graph",
 38        "targets": [
 39          {
 40            "expr": "histogram_quantile(0.99, sum(rate(envoy_http_downstream_rq_time_bucket[5m])) by (service, le))",
 41            "legendFormat": "{{service}} p99"
 42          },
 43          {
 44            "expr": "histogram_quantile(0.95, sum(rate(envoy_http_downstream_rq_time_bucket[5m])) by (service, le))",
 45            "legendFormat": "{{service}} p95"
 46          }
 47        ]
 48      },
 49      {
 50        "title": "Active Connections",
 51        "type": "graph",
 52        "targets": [
 53          {
 54            "expr": "sum(envoy_http_downstream_cx_active) by (service)",
 55            "legendFormat": "{{service}}"
 56          }
 57        ]
 58      },
 59      {
 60        "title": "Consul Service Health",
 61        "type": "stat",
 62        "targets": [
 63          {
 64            "expr": "sum(consul_catalog_service_count)",
 65            "legendFormat": "Total Services"
 66          },
 67          {
 68            "expr": "sum(consul_health_node_failure)",
 69            "legendFormat": "Failed Nodes"
 70          }
 71        ]
 72      }
 73    ],
 74    "refresh": "30s",
 75    "time": {
 76      "from": "now-1h",
 77      "to": "now"
 78    }
 79  }
 80}'
 81
 82# Create service-specific dashboard
 83kubectl create configmap consul-connect-service-details -n monitoring --from-literal=service-details.json='
 84{
 85  "dashboard": {
 86    "title": "Consul Connect Service Details",
 87    "panels": [
 88      {
 89        "title": "Request Volume by Service",
 90        "type": "graph",
 91        "targets": [
 92          {
 93            "expr": "sum(rate(envoy_http_downstream_rq_total[5m])) by (envoy_http_conn_manager_prefix)",
 94            "legendFormat": "{{envoy_http_conn_manager_prefix}}"
 95          }
 96        ]
 97      },
 98      {
 99        "title": "Upstream Connection Status",
100        "type": "graph",
101        "targets": [
102          {
103            "expr": "envoy_cluster_upstream_cx_active",
104            "legendFormat": "Active - {{envoy_cluster_name}}"
105          },
106          {
107            "expr": "envoy_cluster_upstream_cx_connect_fail",
108            "legendFormat": "Failed - {{envoy_cluster_name}}"
109          }
110        ]
111      },
112      {
113        "title": "Certificate Expiry",
114        "type": "graph",
115        "targets": [
116          {
117            "expr": "envoy_server_days_until_first_cert_expiring",
118            "legendFormat": "Days until cert expiry"
119          }
120        ]
121      }
122    ]
123  }
124}'

Step 5: Enhanced Application Configuration with Observability

Update your applications to fully utilize the observability stack.

Enhanced Dynamic App with Tracing and Metrics

  1# dynamic-app-observability.yaml
  2apiVersion: apps/v1
  3kind: Deployment
  4metadata:
  5  name: dynamic-app
  6  namespace: demo
  7  labels:
  8    app: dynamic-app
  9    version: v1.0.0
 10spec:
 11  replicas: 2
 12  selector:
 13    matchLabels:
 14      app: dynamic-app
 15  template:
 16    metadata:
 17      labels:
 18        app: dynamic-app
 19        version: v1.0.0
 20      annotations:
 21        # Consul Connect annotations
 22        "consul.hashicorp.com/connect-inject": "true"
 23        "consul.hashicorp.com/connect-service": "dynamic-app"
 24        "consul.hashicorp.com/connect-service-port": "8080"
 25        "consul.hashicorp.com/connect-service-upstreams": "mysql-server:3306"
 26
 27        # Enable metrics scraping
 28        "consul.hashicorp.com/service-metrics-enabled": "true"
 29        "consul.hashicorp.com/service-metrics-path": "/metrics"
 30        "consul.hashicorp.com/service-metrics-port": "8080"
 31
 32        # Prometheus annotations for application metrics
 33        "prometheus.io/scrape": "true"
 34        "prometheus.io/port": "8080"
 35        "prometheus.io/path": "/metrics"
 36
 37        # Vault Agent Injector annotations
 38        "vault.hashicorp.com/agent-inject": "true"
 39        "vault.hashicorp.com/agent-inject-status": "update"
 40        "vault.hashicorp.com/agent-inject-vault-addr": "https://vault.example.com:8200"
 41        "vault.hashicorp.com/role": "dynamic-app-consul"
 42        "vault.hashicorp.com/agent-inject-secret-config.ini": "dynamic-app/db/creds/app"
 43        "vault.hashicorp.com/agent-inject-template-config.ini": |
 44          [DEFAULT]
 45          LogLevel = DEBUG
 46          Port = 8080
 47
 48          [DATABASE]
 49          Address = 127.0.0.1
 50          Port = 3306
 51          Database = my_app
 52          {{- with secret "dynamic-app/db/creds/app" }}
 53          User = {{ .Data.username }}
 54          Password = {{ .Data.password }}
 55          {{- end }}
 56
 57          [VAULT]
 58          Enabled = True
 59          InjectToken = True
 60          Namespace =
 61          Address = https://vault.example.com:8200
 62          KeyPath = dynamic-app/transit
 63          KeyName = app
 64
 65          [OBSERVABILITY]
 66          MetricsEnabled = True
 67          TracingEnabled = True
 68          JaegerEndpoint = http://jaeger-collector.monitoring.svc.cluster.local:14268/api/traces
 69    spec:
 70      serviceAccountName: dynamic-app-consul
 71      containers:
 72      - name: dynamic-app
 73        image: ghcr.io/infralovers/nomad-vault-mysql:1.0.0
 74        ports:
 75        - containerPort: 8080
 76          name: http
 77        env:
 78        - name: CONFIG_FILE
 79          value: "/vault/secrets/config.ini"
 80        - name: VAULT_ADDR
 81          value: "https://vault.example.com:8200"
 82        # Tracing configuration
 83        - name: JAEGER_ENDPOINT
 84          value: "http://jaeger-collector.monitoring.svc.cluster.local:14268/api/traces"
 85        - name: JAEGER_SERVICE_NAME
 86          value: "dynamic-app"
 87        - name: JAEGER_SAMPLER_TYPE
 88          value: "const"
 89        - name: JAEGER_SAMPLER_PARAM
 90          value: "1"
 91        resources:
 92          requests:
 93            cpu: 256m
 94            memory: 256Mi
 95          limits:
 96            cpu: 500m
 97            memory: 512Mi
 98        livenessProbe:
 99          httpGet:
100            path: /health
101            port: 8080
102          initialDelaySeconds: 60
103          periodSeconds: 10
104        readinessProbe:
105          httpGet:
106            path: /health
107            port: 8080
108          initialDelaySeconds: 30
109          periodSeconds: 5

Step 6: Configure Service Monitors for Prometheus

Create ServiceMonitor resources for automatic Prometheus scraping.

 1# consul-connect-service-monitors.yaml
 2apiVersion: monitoring.coreos.com/v1
 3kind: ServiceMonitor
 4metadata:
 5  name: consul-connect-envoy-sidecars
 6  namespace: monitoring
 7  labels:
 8    app: consul-connect
 9    component: envoy-sidecar
10spec:
11  selector:
12    matchLabels:
13      service: consul-connect-proxy
14  endpoints:
15  - port: envoy-metrics
16    path: /stats/prometheus
17    interval: 30s
18    scrapeTimeout: 10s
19  namespaceSelector:
20    matchNames:
21    - demo
22    - default
23
24---
25apiVersion: monitoring.coreos.com/v1
26kind: ServiceMonitor
27metadata:
28  name: consul-external-servers
29  namespace: monitoring
30  labels:
31    app: consul
32    component: server
33spec:
34  selector:
35    matchLabels:
36      app: consul-external
37  endpoints:
38  - port: http
39    path: /v1/agent/metrics
40    params:
41      format: ["prometheus"]
42    interval: 30s
43
44---
45apiVersion: v1
46kind: Service
47metadata:
48  name: consul-external
49  namespace: monitoring
50  labels:
51    app: consul-external
52spec:
53  type: ExternalName
54  externalName: consul.example.com
55  ports:
56  - port: 8500
57    name: http

Step 7: Deploy and Configure Alerting Rules

Create alerting rules for proactive monitoring of your service mesh.

 1# consul-connect-alerts.yaml
 2apiVersion: monitoring.coreos.com/v1
 3kind: PrometheusRule
 4metadata:
 5  name: consul-connect-alerts
 6  namespace: monitoring
 7  labels:
 8    app: consul-connect
 9    component: alerting
10spec:
11  groups:
12  - name: consul-connect.rules
13    rules:
14    - alert: ConsulConnectServiceDown
15      expr: up{job="consul-connect-envoy-sidecar"} == 0
16      for: 5m
17      labels:
18        severity: critical
19      annotations:
20        summary: "Consul Connect service is down"
21        description: "Service {{ $labels.instance }} has been down for more than 5 minutes."
22
23    - alert: ConsulConnectHighErrorRate
24      expr: |
25        (
26          sum(rate(envoy_http_downstream_rq_xx{envoy_response_code_class="5"}[5m])) by (service) /
27          sum(rate(envoy_http_downstream_rq_total[5m])) by (service)
28        ) * 100 > 10
29      for: 5m
30      labels:
31        severity: warning
32      annotations:
33        summary: "High error rate in Consul Connect service"
34        description: "Service {{ $labels.service }} has error rate of {{ $value }}% over the last 5 minutes."
35
36    - alert: ConsulConnectHighLatency
37      expr: |
38        histogram_quantile(0.99,
39          sum(rate(envoy_http_downstream_rq_time_bucket[5m])) by (service, le)
40        ) > 2000
41      for: 10m
42      labels:
43        severity: warning
44      annotations:
45        summary: "High latency in Consul Connect service"
46        description: "Service {{ $labels.service }} has 99th percentile latency of {{ $value }}ms."
47
48    - alert: ConsulServiceUnhealthy
49      expr: consul_catalog_service_count{state="critical"} > 0
50      for: 5m
51      labels:
52        severity: critical
53      annotations:
54        summary: "Consul services are unhealthy"
55        description: "{{ $value }} Consul services are in critical state."
56
57    - alert: ConsulConnectCertificateExpiringSoon
58      expr: envoy_server_days_until_first_cert_expiring < 7
59      for: 1h
60      labels:
61        severity: warning
62      annotations:
63        summary: "Consul Connect certificate expiring soon"
64        description: "Certificate will expire in {{ $value }} days."

Health Check Monitoring

Monitor Consul Connect health checks and service registration:

 1# Create health check monitoring script
 2kubectl create configmap health-check-monitor -n demo --from-literal=monitor.sh='
 3#!/bin/bash
 4while true; do
 5  echo "=== Consul Service Status ==="
 6  curl -s "https://consul.example.com:8500/v1/agent/services" | jq .
 7
 8  echo "=== Connect Proxy Status ==="
 9  curl -s localhost:19000/clusters | grep -E "(health_flags|priority|max_requests)"
10
11  echo "=== Certificate Information ==="
12  curl -s localhost:19000/certs | jq .
13
14  sleep 30
15done'
16
17# Run health monitoring
18kubectl run health-monitor --image=curlimages/curl:latest -n demo --rm -it --restart=Never -- sh -c "
19  apk add --no-cache jq bash
20  while true; do
21    echo '=== Service Mesh Health Check ==='
22    curl -s localhost:19000/ready && echo 'Envoy Ready' || echo 'Envoy Not Ready'
23    curl -s localhost:19000/stats | grep server.state
24    sleep 30
25  done
26"

Monitoring Best Practices

1. Key Metrics to Monitor

Service Mesh Level:

Request rate: rate(envoy_http_downstream_rq_total[5m])
Error rate: rate(envoy_http_downstream_rq_xx[5m])
Response time: histogram_quantile(0.95, envoy_http_downstream_rq_time_bucket)
Connection count: envoy_http_downstream_cx_active

Infrastructure Level:

CPU and Memory usage of Envoy sidecars
Network throughput
Certificate expiry times
Consul cluster health

2. Alerting Thresholds

Error Rate: > 5% sustained for 5 minutes
Response Time: p95 > 1 second sustained for 10 minutes
Availability: Service down for > 2 minutes
Certificate Expiry: < 7 days remaining

3. Dashboard Organization

Overview Dashboard: High-level service mesh metrics
Service-Specific Dashboards: Detailed metrics per service
Infrastructure Dashboard: Kubernetes and infrastructure metrics
Security Dashboard: mTLS status, certificate health, intention violations

Conclusion

Implementing comprehensive observability for your Consul Connect service mesh in Kubernetes provides crucial insights into service performance, security, and health. With Prometheus metrics, Grafana dashboards, and Jaeger tracing, you can:

Monitor service mesh performance with detailed metrics and dashboards
Track distributed requests across multiple services with tracing
Proactively identify issues through intelligent alerting
Troubleshoot problems efficiently with correlated metrics, logs, and traces
Ensure security compliance by monitoring mTLS certificates and intentions

This observability stack builds upon our secure service mesh foundation, providing the visibility needed to operate Consul Connect reliably in production Kubernetes environments.

Key takeaways:

Three Pillars: Metrics, logs, and traces provide complete observability
Automated Collection: ServiceMonitor resources enable automatic metric discovery
Custom Dashboards: Tailored visualizations for service mesh specific metrics
Proactive Alerting: Early warning system for service mesh issues
Performance Monitoring: Track and optimize service mesh overhead
Security Visibility: Monitor certificate health and communication patterns

Start monitoring your Consul Connect service mesh today and gain the insights needed to maintain high availability, performance, and security in your Kubernetes environment.

For implementing multi-platform service mesh architectures that span Kubernetes, Nomad, bare metal, and VMs, see our comprehensive guide on Multi-Platform Service Mesh: Connecting Kubernetes, Nomad, Bare Metal, and VMs with Consul Connect.

Go Back explore our courses

Cloud Native Essentials: Kubernetes

This is an introduction to Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications.

Vault Enterprise

Learn how to use HashiCorp Vault Enterprise to centrally secure your sensitive data and meet enterprise-grade compliance and operations requirements.

Consul Enterprise

Connect dynamic applications.

Matthias Theuermann | 05.11.2025 security, hashicorp, devops

Keeping Credentials Out of Code - A Practical Guide to 1Password and Vault

The Problem: Hardcoded Credentials Every developer has faced this temptation: you need to test something quickly, so you hardcode an API key or database

Matthias Theuermann | 31.10.2025 artificial intelligence, platform engineering, cloud native, ci/cd

Beyond Copy-Paste: Building Backstage with AI-Assisted Development

How Claude Sonnet 4.5 and GitHub Copilot helped us navigate the maze of custom Backstage integrations The Backstage Promise (and the Reality) Spotify's

Martin Buchleitner | 24.10.2025 artificial intelligence, platform engineering, cloud native, devops

Running MCP Servers Declaratively with Toolhive on Kubernetes

Short read (~6–7 min) – focused on the operational side of Model Context Protocol (MCP) enablement. 1. Problem in Practice Single MCP servers are easy.

Martin Buchleitner | 23.10.2025 artificial intelligence, devops, cloud native, platform engineering, security

Operational Patterns: LiteLLM with MCP Servers (and an n8n + Open WebUI Alternative)

Introduction Model access alone rarely delivers differentiated organizational value. Real leverage appears when language models can safely invoke tools

Jürgen Brüder | 21.10.2025 devops, cloud native

Docker vs Podman - Choosing the Right Container Platform for Your Team

The container ecosystem has evolved significantly over the past few years, and teams today have more choices than ever when selecting their container runtime

We are here for you

You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.