- 2 minutes to read

FAQ - Connectivity Loss Handling

Question

What happens if the agent loses connectivity to a RabbitMQ cluster?

Answer

The agent detects connectivity failure during next monitoring cycle (default 60 seconds), marks the cluster Resource as "Error" with connectivity failure message, and triggers alerts. When connectivity restored (network recovery, RabbitMQ restart), agent resumes normal monitoring automatically—no manual intervention required.

Failure Detection

Agent detects connectivity issues through:

  • HTTP timeout (Management API doesn't respond within configured timeout)
  • Network errors (DNS resolution fails, TCP connection refused, SSL handshake failure)
  • Authentication failures (credentials expired, user deleted, permissions revoked)
  • Management Plugin disabled (API returns 404 or service unavailable)

Alert Behavior

When connectivity lost:

  1. Resource status changes to "Error"
  2. Alert triggered with error message (e.g., "Cannot connect to RabbitMQ Management API at https://rabbitmq-prod:15671 - Connection refused")
  3. Monitoring suspended until connectivity restored
  4. Retry behavior: Agent retries connection on next monitoring cycle (default 60 seconds)

Automatic Recovery

When connectivity restored:

  1. Agent reconnects on next monitoring cycle
  2. Resource status changes back to "OK"
  3. Monitoring resumes immediately
  4. Recovery alert (optional) notifies operations that connection restored

No manual restart or reconfiguration required.

Common Causes

Cause Detection Resolution
RabbitMQ server restart Connection refused during restart window Wait for startup (30-60 seconds), agent reconnects automatically
Network outage DNS or TCP connection failure Fix network connectivity, agent reconnects when network restored
Firewall rule change Connection timeout or refused Verify firewall allows agent → RabbitMQ Management API port (15671/15672)
SSL certificate expired SSL handshake failure Renew certificate, agent reconnects after certificate replaced
Credentials changed 401 Unauthorized Update agent configuration with new credentials

Best Practices

  • Configure alert throttling to avoid alert storm during planned maintenance (e.g., suppress alerts for 30 minutes during RabbitMQ upgrade)
  • Monitor agent health to ensure agent itself is running (agent down = no alerts from RabbitMQ clusters)
  • Test connectivity after initial configuration to verify Management API accessible from agent server

Next Step

Configuration Guide
Prerequisites

RabbitMQ Agent Overview
Troubleshooting Overview