FAQ - Dead Letter Exchange Monitoring
Question
Can I monitor the dead letter exchange?
Answer
Yes—treat dead letter exchange and its bound queues like any other exchange/queue. Monitor depth and message rates with strict thresholds (e.g., alert if DLX queue depth >0). Messages in DLX indicate delivery failures requiring investigation—proactive DLX monitoring prevents silent message loss.
Why Monitor Dead Letter Exchanges
Dead letter exchanges capture messages that fail delivery due to:
- Message rejected by consumer (negative acknowledgment with requeue=false)
- Message TTL expired (message sat in queue longer than configured time-to-live)
- Queue length limit reached (max-length policy triggered, oldest messages moved to DLX)
- Message expired while in queue (per-message TTL)
Without monitoring: Messages accumulate in DLX silently—no visibility into how many orders/payments/events lost.
With monitoring: Alert immediately when DLX receives messages—investigate root cause before data loss impact.
Configuration Strategy
1. Identify Dead Letter Queues
Dead letter exchange typically routes to dedicated queues:
dlx.orders
(orders that failed delivery)dlx.payments
(payment messages rejected)dlx.events
(events that expired)
2. Set Strict Thresholds
Configure aggressive thresholds:
Threshold | Value | Reasoning |
---|---|---|
Queue depth | >0 messages | ANY message in DLX indicates failure requiring investigation |
Warning | Depth >0 | Immediate notification |
Critical | Depth >10 | Multiple failures, likely systemic issue |
3. Configure Alerts
- Alert immediately when DLX queue depth increases
- Include message samples in alert (if possible) to aid diagnosis
- Set up escalation if DLX not cleared within timeframe (e.g., page on-call if >50 messages in DLX after 15 minutes)
Investigation Workflow
When DLX alert triggers:
- Check message count and rate—single message vs. flood
- View message headers—
x-death
header contains rejection reason, original queue, timestamp - Identify root cause:
- Consumer rejection → Consumer code bug or validation failure
- TTL expiration → Consumer too slow or offline
- Queue length limit → Backlog, need to scale consumers
- Fix root cause (deploy bug fix, scale consumers, adjust TTL)
- Decide message fate:
- Requeue to original queue if issue fixed
- Delete if messages invalid/expired
- Archive to long-term storage for audit
Example Configuration
Resource: RabbitMQ Production Cluster
Queue: dlx.orders
Alert: Queue Depth >0 messages (Warning)
Alert: Queue Depth >10 messages (Critical)
Notification: Email + Teams channel #orders-ops
Consumer Lag: Alert if lag >5 minutes (if DLX has consumer for archival)
Next Step
Configuration Guide
Monitoring RabbitMQ Features