- 2 minutes to read
FAQ - Connectivity Loss Handling
Question
What happens if the agent loses connectivity to a RabbitMQ cluster?
Answer
The agent detects connectivity failure during next monitoring cycle (default 60 seconds), marks the cluster Resource as "Error" with connectivity failure message, and triggers alerts. When connectivity restored (network recovery, RabbitMQ restart), agent resumes normal monitoring automatically—no manual intervention required.
Failure Detection
Agent detects connectivity issues through:
- HTTP timeout (Management API doesn't respond within configured timeout)
- Network errors (DNS resolution fails, TCP connection refused, SSL handshake failure)
- Authentication failures (credentials expired, user deleted, permissions revoked)
- Management Plugin disabled (API returns 404 or service unavailable)
Alert Behavior
When connectivity lost:
- Resource status changes to "Error"
- Alert triggered with error message (e.g., "Cannot connect to RabbitMQ Management API at https://rabbitmq-prod:15671 - Connection refused")
- Monitoring suspended until connectivity restored
- Retry behavior: Agent retries connection on next monitoring cycle (default 60 seconds)
Automatic Recovery
When connectivity restored:
- Agent reconnects on next monitoring cycle
- Resource status changes back to "OK"
- Monitoring resumes immediately
- Recovery alert (optional) notifies operations that connection restored
No manual restart or reconfiguration required.
Common Causes
Cause | Detection | Resolution |
---|---|---|
RabbitMQ server restart | Connection refused during restart window | Wait for startup (30-60 seconds), agent reconnects automatically |
Network outage | DNS or TCP connection failure | Fix network connectivity, agent reconnects when network restored |
Firewall rule change | Connection timeout or refused | Verify firewall allows agent → RabbitMQ Management API port (15671/15672) |
SSL certificate expired | SSL handshake failure | Renew certificate, agent reconnects after certificate replaced |
Credentials changed | 401 Unauthorized | Update agent configuration with new credentials |
Best Practices
- Configure alert throttling to avoid alert storm during planned maintenance (e.g., suppress alerts for 30 minutes during RabbitMQ upgrade)
- Monitor agent health to ensure agent itself is running (agent down = no alerts from RabbitMQ clusters)
- Test connectivity after initial configuration to verify Management API accessible from agent server
Next Step
Configuration Guide
Prerequisites