Skip to content

UDP Message Queue

Butler SOS uses managed queues to handle incoming UDP messages from Qlik Sense. This ensures that a sudden burst of events doesn't overwhelm Butler SOS or its event destination.

Overview

Both the User Events and Log Events UDP servers use message queues with the following protections:

  • Controlled concurrency — messages are processed with a configurable limit on parallel operations
  • Optional rate limiting — prevent message flooding by limiting messages per minute
  • Message size validation — messages exceeding the maximum UDP datagram size are rejected
  • Backpressure detection — warnings when queue utilization exceeds a configurable threshold
  • Queue metrics — optional storage of queue health data in InfluxDB for monitoring and alerting

All messages flow through the queue — it cannot be disabled.

Message Flow

Components

  • Circular Buffer — tracks last 1000 processing times for percentile calculations
  • Rate Limiter — fixed-window counter that resets each minute
  • Metrics Collector — thread-safe counters and timing data
  • InfluxDB Writer — periodic metrics storage at configurable interval

Configuration

User Events Queue

yaml
Butler-SOS:
  userEvents:
    udpServerConfig:
      serverHost: <IP or FQDN> # Host/IP where user event server will listen for events from Sense
      portUserActivityEvents: 9997 # Port on which user event server will listen for events from Sense
      # Source IP validation for incoming UDP messages
      enableSourceValidation: false # Set to true to restrict sources to allowedSources list
      allowedSources: [] # List of allowed IPv4 addresses or hostnames
      # Message queue settings for handling incoming UDP messages
      messageQueue:
        maxConcurrent: 10 # Max number of messages being processed simultaneously (default: 10)
        maxSize: 200 # Max queue size before messages are dropped (default: 200)
        backpressureThreshold: 80 # Warn when queue utilization reaches this % (default: 80)
      # Rate limiting to prevent message flooding
      rateLimit:
        enable: false # Enable rate limiting (default: false)
        maxMessagesPerMinute: 600 # Max messages per minute, ~10/sec (default: 600)
      maxMessageSize: 65507 # Max UDP message size in bytes (default: 65507, UDP max)
      # Queue metrics storage in InfluxDB
      queueMetrics:
        influxdb:
          enable: false # Store queue metrics in InfluxDB (default: false)
          writeFrequency: 20000 # How often to write metrics, milliseconds (default: 20000)
          measurementName: user_events_queue # InfluxDB measurement name (default: user_events_queue)
          tags: # Optional tags added to queue metrics
            - name: qs_environment
              value: prod

Log Events Queue

yaml
Butler-SOS:
  logEvents:
    udpServerConfig:
      serverHost: <IP or FQDN> # Host/IP where log event server will listen for events from Sense
      portLogEvents: 9996 # Port on which log event server will listen for events from Sense
      # Source IP validation for incoming UDP messages
      enableSourceValidation: false # Set to true to restrict sources to allowedSources list
      allowedSources: [] # List of allowed IPv4 addresses or hostnames
      # Message queue settings for handling incoming UDP messages
      messageQueue:
        maxConcurrent: 10 # Max number of messages being processed simultaneously (default: 10)
        maxSize: 200 # Max queue size before messages are dropped (default: 200)
        backpressureThreshold: 80 # Warn when queue utilization reaches this % (default: 80)
      # Rate limiting to prevent message flooding
      rateLimit:
        enable: false # Enable rate limiting (default: false)
        maxMessagesPerMinute: 600 # Max messages per minute (default: 600)
      maxMessageSize: 65507 # Max UDP message size in bytes (default: 65507, UDP max)
      # Queue metrics storage in InfluxDB
      queueMetrics:
        influxdb:
          enable: false # Store queue metrics in InfluxDB (default: false)
          writeFrequency: 20000 # How often to write metrics, milliseconds (default: 20000)
          measurementName: log_events_queue # InfluxDB measurement name (default: log_events_queue)
          tags: # Optional tags added to queue metrics
            - name: qs_environment
              value: prod

Configuration Properties

messageQueue

PropertyDefaultDescription
maxConcurrent10Number of messages processed simultaneously. Higher values = more throughput but more CPU/memory usage. Recommended: 5-20 depending on server capacity.
maxSize200Maximum queue size. When exceeded, new messages are rejected and dropped. Recommended: 100-500. Note that the queue only counts pending messages (not those currently processing), so total capacity is maxSize + maxConcurrent.
backpressureThreshold80Queue utilization percentage that triggers backpressure warnings. Recommended: 70-90%.

rateLimit

PropertyDefaultDescription
enablefalseEnable rate limiting to prevent message flooding. Rate limiting uses a fixed-window counter that resets each minute.
maxMessagesPerMinute600Maximum messages allowed per minute, across all source IPs.

maxMessageSize

PropertyDefaultDescription
maxMessageSize65507Maximum UDP message size in bytes. The default is the UDP maximum datagram size. Messages exceeding this are rejected and counted in messages_dropped_size.

Source Validation Settings

PropertyDefaultDescription
enableSourceValidationfalseEnable source IP validation for incoming UDP messages. When true, Butler SOS resolves allowedSources at startup and rejects non-matching sender IPs before queueing or payload parsing. If the allow-list is empty or nothing resolves, validation is disabled at startup.
allowedSources[]List of allowed IPv4 addresses or hostnames. Hostnames are resolved once at startup, and all resolved IPv4 addresses are added to the allow-list.

queueMetrics.influxdb

PropertyDefaultDescription
enablefalseStore queue metrics in InfluxDB for monitoring and alerting.
writeFrequency20000How often to write metrics in milliseconds. Lower values = more frequent updates but more InfluxDB writes.
measurementNamevariesInfluxDB measurement name. Defaults: user_events_queue or log_events_queue.
tags[]Optional static tags added to all queue metrics data points in InfluxDB.

Source IP Validation

Butler SOS can restrict which Qlik Sense servers are permitted to send UDP messages. When enabled, Butler SOS checks the sender IP before any other UDP processing. Unauthorized packets are dropped immediately. The first rejection from a given source IP is logged at warn level, while repeated rejections within one minute are logged at debug level to avoid flooding the logs.

How it works:

  1. At startup, Butler SOS parses allowedSources. Literal IPv4 addresses are kept as-is and hostnames are resolved to IPv4 addresses in parallel.
  2. If a hostname resolves to multiple IPv4 addresses, all of them are added to the active allow-list.
  3. Entries that cannot be resolved are logged and skipped; successfully resolved entries remain active.
  4. If validation is enabled but allowedSources is empty, or none of the configured entries resolve, Butler SOS disables source validation at startup and accepts all sources.
  5. When a UDP message arrives, the sender IP is checked before size validation, rate limiting, queueing, or payload parsing.

Supported formats:

  • IPv4 addresses: Exact match (e.g., 192.168.1.100)
  • Hostnames: Resolved to IPv4 at startup (e.g., qlik-sense-1.company.internal)

IPv6 addresses are not supported — use IPv4 addresses or hostnames that resolve to IPv4.

Notes:

  • Disabled by default (enableSourceValidation: false) for backward compatibility
  • Hostnames are resolved once at startup, not on each message
  • Rejection warnings are throttled to once per source IP per minute; repeated drops in the same window are logged at debug
  • Should be used together with firewall rules for defense in depth

Security benefit: Since UDP lacks built-in authentication, source IP validation prevents unauthorized hosts from sending messages to Butler SOS. This is critical when Butler SOS is exposed to the network in production environments.

Fail-open behavior: If enableSourceValidation: true but allowedSources is empty, or if all entries fail DNS resolution, validation is automatically disabled and all sources are accepted. This is a startup safeguard to avoid accidental self-lockout during incremental deployment.

Example: Allow specific Sense servers

yaml
Butler-SOS:
  userEvents:
    udpServerConfig:
      enableSourceValidation: true
      allowedSources:
        - 192.168.10.50
        - qlik-sense-1.company.internal
  logEvents:
    udpServerConfig:
      enableSourceValidation: true
      allowedSources:
        - 192.168.10.50
        - qlik-sense-1.company.internal

Example: Disable validation (default)

yaml
Butler-SOS:
  userEvents:
    udpServerConfig:
      enableSourceValidation: false
      allowedSources: []
  logEvents:
    udpServerConfig:
      enableSourceValidation: false
      allowedSources: []

Performance Tuning

Small Environment (< 50 users, < 10 apps)

yaml
messageQueue:
  maxConcurrent: 5
  maxSize: 100
rateLimit:
  enable: false

Medium Environment (50-200 users, 10-50 apps)

yaml
messageQueue:
  maxConcurrent: 10
  maxSize: 200
rateLimit:
  enable: false
  maxMessagesPerMinute: 600

Large Environment (200+ users, 50+ apps)

yaml
messageQueue:
  maxConcurrent: 20
  maxSize: 500
rateLimit:
  enable: true
  maxMessagesPerMinute: 1200

Tuning Based on Metrics

SymptomLikely CauseAction
High queue utilization (> 80%)Messages arriving faster than they can be processedIncrease maxConcurrent and/or maxSize. Check if downstream systems (InfluxDB, MQTT) are a bottleneck.
Dropped messages (messages_dropped_queue_full > 0)Queue capacity insufficient for message burstsIncrease maxSize and/or maxConcurrent. Consider rate limiting at the Qlik Sense side.
High processing times (p95 > 1000ms)Resource contention or slow downstream systemsDecrease maxConcurrent to reduce contention. Check downstream system performance and network latency.
Rate limit violations (messages_dropped_rate_limit > 0)Rate limit too restrictive or excessive Sense messagesIncrease maxMessagesPerMinute if capacity allows. Investigate why Sense is sending excessive messages.

Resource Considerations

Memory usage: Each queued message uses approximately 1-5 KB. At maxSize: 200, each queue uses about 200-1000 KB. Two queues (user + log events) use 400-2000 KB total.

CPU usage: Higher maxConcurrent values will use more CPU. Usually not a bottleneck unless processing time per message is high or the server is under heavy load. Monitor CPU usage when tuning concurrency.

InfluxDB load: Each queue writes metrics at the configured writeFrequency interval. At the default 20 seconds, each queue writes 3 times per minute (6 writes/minute total). Increase the interval if InfluxDB is under load.

Queue Metrics in InfluxDB

When queueMetrics.influxdb.enable is set to true, queue metrics are stored in InfluxDB as two separate measurements:

  • user_events_queue (configurable measurement name for user events)
  • log_events_queue (configurable measurement name for log events)

Tags

TagTypeDescription
queue_typestringQueue identifier — user_events or log_events
hoststringButler SOS hostname
Custom tagsstringFrom config tags array

Fields

Queue Status

FieldTypeDescription
queue_sizeintegerCurrent number of messages in queue
queue_max_sizeintegerMaximum queue capacity
queue_utilization_pctfloatQueue utilization percentage (0-100)
queue_runningintegerMessages currently being processed

Message Counters

FieldTypeDescription
messages_receivedintegerTotal messages received (since last write)
messages_queuedintegerMessages added to queue
messages_processedintegerMessages successfully processed
messages_failedintegerMessages that failed processing

Dropped Messages

FieldTypeDescription
messages_dropped_totalintegerTotal dropped messages
messages_dropped_rate_limitintegerDropped due to rate limit
messages_dropped_queue_fullintegerDropped due to full queue
messages_dropped_sizeintegerDropped due to size validation

Performance

FieldTypeDescription
processing_time_avg_msfloatAverage processing time (milliseconds)
processing_time_p95_msfloat95th percentile processing time
processing_time_max_msfloatMaximum processing time

Rate Limit & Backpressure

FieldTypeDescription
rate_limit_currentintegerCurrent message rate (messages/minute)
backpressure_activeintegerBackpressure status (0=inactive, 1=active)

Example Grafana Queries

Queue utilization over time:

text
from(bucket: "butler-sos")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "user_events_queue" or r["_measurement"] == "log_events_queue")
  |> filter(fn: (r) => r["_field"] == "queue_utilization_pct")

Messages dropped by reason:

text
from(bucket: "butler-sos")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "user_events_queue")
  |> filter(fn: (r) => r["_field"] =~ /messages_dropped_/)
  |> aggregateWindow(every: 1m, fn: sum)

Processing time percentiles:

text
from(bucket: "butler-sos")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "log_events_queue")
  |> filter(fn: (r) => r["_field"] == "processing_time_p95_ms" or r["_field"] == "processing_time_avg_ms")

Backpressure events:

text
from(bucket: "butler-sos")
  |> range(start: -24h)
  |> filter(fn: (r) => r["_field"] == "backpressure_active")
  |> filter(fn: (r) => r["_value"] == 1)

Troubleshooting

Backpressure Warnings

Symptom: Log messages like:

WARN: [UDP Queue] Backpressure detected for user_events: Queue utilization 85.5% (threshold: 80%)

Causes:

  • Message rate exceeds processing capacity
  • Downstream systems (InfluxDB/MQTT) are slow to respond
  • Insufficient maxConcurrent setting

Solutions:

  1. Monitor queue metrics to identify the pattern
  2. Increase maxConcurrent if CPU/memory is available
  3. Increase maxSize for more buffer capacity
  4. Check downstream system performance
  5. Enable rate limiting if messages are arriving too fast

Messages Being Dropped

Queue full drops (messages_dropped_queue_full):

  • Queue size too small for message bursts
  • Increase maxSize and/or maxConcurrent

Rate limit drops (messages_dropped_rate_limit):

  • Rate limit too restrictive
  • Increase maxMessagesPerMinute or disable rate limiting
  • Investigate why Send is sending so many messages

Size validation drops (messages_dropped_size):

  • Messages exceed the UDP datagram size
  • Usually indicates malformed messages from Qlik Sense
  • Check your Qlik Sense log appender configuration

High Processing Times

Symptom: processing_time_p95_ms > 1000ms

Causes: Downstream systems slow (InfluxDB write latency, MQTT broker delays), network latency, too many concurrent operations causing resource contention.

Solutions:

  1. Check InfluxDB query performance
  2. Check MQTT broker responsiveness
  3. Reduce maxConcurrent to decrease resource contention
  4. Review network latency between Butler SOS and destinations

Debug Logging

For verbose queue debugging, set the Butler SOS log level to verbose or debug:

yaml
Butler-SOS:
  logLevel: verbose

Look for log messages with these prefixes:

  • [UDP Queue] — Queue operations and status
  • UDP QUEUE METRICS INFLUXDB — Metrics storage operations
  • USER EVENT QUEUE METRICS / LOG EVENT QUEUE METRICS — Per-queue status

No Queue Metrics in InfluxDB

If queue metrics are not appearing in InfluxDB, check:

  1. queueMetrics.influxdb.enable is set to true
  2. Butler-SOS.influxdbConfig.enable is set to true
  3. The InfluxDB connection is working (check Butler SOS logs)
  4. The configured measurement name is correct
  5. Wait for the writeFrequency interval to elapse

Config Validation Errors

Symptom: Butler SOS fails to start with config validation errors.

Cause: The config file is missing required queue configuration sections under udpServerConfig.

Solution: Major Butler SOS releases can introduce breaking changes to the config file structure. See the warning above for upgrade instructions — back up your config, use production_template.yaml from the release ZIP as a fresh base, then re-apply your custom settings.

Monitoring Best Practices

Essential Alerts

ConditionSuggested Threshold
Queue utilization too highqueue_utilization_pct > 90 for >5 minutes
Excessive dropped messagesmessages_dropped_total > 100 per minute
Persistent backpressurebackpressure_active = 1 for >10 minutes
Processing degradationprocessing_time_p95_ms > 2000
  1. Queue utilization percentage (line chart, both queues)
  2. Messages received vs processed (line chart)
  3. Dropped messages by reason (stacked area chart)
  4. Processing time percentiles (line chart: avg, p95, max)
  5. Backpressure status (state timeline)
  6. Current queue size (gauge)

Proactive Monitoring

  • Establish baseline processing times for your environment during normal operations
  • Review queue metrics weekly to catch trends before they become problems
  • Test queue behavior during peak usage periods (e.g., end-of-month reloads)
  • Adjust thresholds after observing actual patterns rather than relying solely on defaults

Released under the MIT License.