- 2 minutes to read

FAQ - Dead Letter Exchange Monitoring

Question

Can I monitor the dead letter exchange?

Answer

Yes—treat dead letter exchange and its bound queues like any other exchange/queue. Monitor depth and message rates with strict thresholds (e.g., alert if DLX queue depth >0). Messages in DLX indicate delivery failures requiring investigation—proactive DLX monitoring prevents silent message loss.

Why Monitor Dead Letter Exchanges

Dead letter exchanges capture messages that fail delivery due to:

  • Message rejected by consumer (negative acknowledgment with requeue=false)
  • Message TTL expired (message sat in queue longer than configured time-to-live)
  • Queue length limit reached (max-length policy triggered, oldest messages moved to DLX)
  • Message expired while in queue (per-message TTL)

Without monitoring: Messages accumulate in DLX silently—no visibility into how many orders/payments/events lost.

With monitoring: Alert immediately when DLX receives messages—investigate root cause before data loss impact.

Configuration Strategy

1. Identify Dead Letter Queues

Dead letter exchange typically routes to dedicated queues:

  • dlx.orders (orders that failed delivery)
  • dlx.payments (payment messages rejected)
  • dlx.events (events that expired)

2. Set Strict Thresholds

Configure aggressive thresholds:

Threshold Value Reasoning
Queue depth >0 messages ANY message in DLX indicates failure requiring investigation
Warning Depth >0 Immediate notification
Critical Depth >10 Multiple failures, likely systemic issue

3. Configure Alerts

  • Alert immediately when DLX queue depth increases
  • Include message samples in alert (if possible) to aid diagnosis
  • Set up escalation if DLX not cleared within timeframe (e.g., page on-call if >50 messages in DLX after 15 minutes)

Investigation Workflow

When DLX alert triggers:

  1. Check message count and rate—single message vs. flood
  2. View message headersx-death header contains rejection reason, original queue, timestamp
  3. Identify root cause:
    • Consumer rejection → Consumer code bug or validation failure
    • TTL expiration → Consumer too slow or offline
    • Queue length limit → Backlog, need to scale consumers
  4. Fix root cause (deploy bug fix, scale consumers, adjust TTL)
  5. Decide message fate:
    • Requeue to original queue if issue fixed
    • Delete if messages invalid/expired
    • Archive to long-term storage for audit

Example Configuration

Resource: RabbitMQ Production Cluster
Queue: dlx.orders
  Alert: Queue Depth >0 messages (Warning)
  Alert: Queue Depth >10 messages (Critical)
  Notification: Email + Teams channel #orders-ops
  Consumer Lag: Alert if lag >5 minutes (if DLX has consumer for archival)

Next Step

Configuration Guide
Monitoring RabbitMQ Features

RabbitMQ Agent Overview
Troubleshooting Overview