The whole SenseOps (Butler family of tools, together with support tools like InfluxDB, Prometheus, Grafana etc) stack is really a platform that can be configured and used in lots of different ways.
Be curious and bold - extend the given examples and use cases with your own!
And when doing so - please consider sharing your successes (and failures..) with others, as inspiration and insight.
Tip
You are strongly recommended to use the latest version of Grafana with the latest version of Butler SOS.
That said, earlier Butler SOS versions included some nice demo dashboards too, these are still found below as a source for inspiration what can be done.
1 - Visualising Butler SOS metrics in New Relic
New Relic is a complete SaaS product that offers both data storage and powerful, yet easy to set up and use visualisations.
In this dashboard most of the data comes from Butler SOS, but the failed reload pie chart and table at the top are created using data from the Butler tool.
New Relic does not give you nearly the same level control as Grafana does, for example when it comes to fine-tuning visual details of charts, tables etc.
That can feel a bit limiting at first, but as New Relic’s query language is very powerful and is very well integrated in the chart editor, it’s not really a problem.
A benefit of New Relic’s dashboards is that their dashboard editor very effectively guide you through the creation of charts, using the various kinds of data (metrics, events, logs) you have sent - and are sending - to their database.
New Relic dashboards are thus definitely on the same level as Grafana ditto, which one to use is a matter of preference.
2 - Visualising Butler SOS metrics in Grafana
Grafana is a powerful tool for creating dashboards that visualise data from various sources.
This page shows examples of Grafana dashboards that visualise data from Butler SOS.
2.1 - Qlik Sense monitoring using Butler SOS v9.2 and Grafana v9.1
Grafana 9 adds some interesting features around alerting as well as serveral new and improved chart types.
The visual appearance is quite similar to previous versions, but line charts, tables etc have been replaced with the updated variants that arrived with Grafana 9.
2.2 - Qlik Sense monitoring using Butler SOS v7 and Grafana v8
With version 8 Grafana further establishes its position as the leadning open source platform for obervability and real-time dashboards.
Butler SOS takes advantage of this, below a sample dashboard is shown.
A concept that has proven useful many times is to use an overview dashboard to monitors high-level metrics for the entire Sense cluster. A separate, parameterised dashboard then drill into the details for each server.
Grafana variables make this both easy to set up, scalable and very powerful.
Sample dashboards are available in the Git repository.
Before importing these to Grafana you should create a Grafana data source called “senseops”, and point it to your InfluxDB database. When you then import the dashboards they should find your database straight away.
Dashboard installation
The dashboard file senseops_v7_0_dashboard.json was created using Butler SOS 7.0 and Grafana 8. It thus usees the new chart, data transformation and alerting features that were introduced in Grafana 8.
Overview metrics
The dashboard has a top section that’s always expanded.
A set of (by default) collapsed sections contain different kinds of metrics and log events.
Top level metrics
Low memory alerts can be set (using Grafana’s alert feature). Such alerts can be sent (using features built into Grafana) as notifications to Slack, Teams, Pager Duty, as email etc.
To keep the dashboard nice and clean it’s usually a good idea to put alert charts in their own section at the bottom, or in a separate dashboard dedicated to alerts.
Apps in memory
From a sysadmin perspective it’s often interesting to know what apps are loaded into memory on each Sense server.
For example, when a server is quickly loosing RAM it’s extremely useful to be able to zoom in to the very minute when the RAM drop occurs, then look at what apps were present in memory. One of those apps is probably not well designed, or is at least using a lot of memory.
The dashboard separates regular apps and session apps.
You can also use Grafana’s standard filtering features to narrow down on the server(s) of interest.
Apps loaded into memory
Users & sessions per server
If things really go wrong wrong in a Qlik Sense Enterprise environment there connected users might be kicked out. It is therefore important to know at any given time how many users are connected, and be able to detect sudden drops in user count.
Another use case could be for maintenance windows: You then want to know how many - and which - users are connected, so you can send them a message that maintenance is about to start.
Users and sessions per server
User events
If Butler SOS has been configured to handle user level events coming from Sense, these are shown here.
You get information about where the event took place and which user has
logged in (started a session)
logged out (stopped a session)
timed out (stopped a session)
opened a connection to a new app
closed a connection to an app (for example closed a browser tab)
… and more
User events
Warnings & Errors
In previous Butler SOS demo dashboards this information came from log db.
Starting with Butler SOS 7 the source of this data is instead the log events introduced in version 7.
Some of this information is also available in the standard Operations Monitor app in Qlik Sense Enterprise, but only in a retrospective way.
Having access to it in close to real time makes it possible to act on developing issues quicker.
Charts provide overview while tables then give the actual messages, as they appear in the log files.
Error and warning charts
Error and warning tables
It’s also possible to drill down into individual warnings and errors to get very detailed information about what happened:
Detailed view into errors and warnings
Butler SOS metrics
Butler SOS is very robust indeed, but it may still be of interest to track its memory use, to make sure there aren’t any memory leaks etc.
Butler SOS memory usage
2.3 - Qlik Sense monitoring using Grafana 7
Grafana 7 is a big update when it comes to visualisations. Grafana was excellent already in version 6, but with v7 things are taken to a new level.
A concept that has proven useful many times is to use an overview dashboard to monitors high-level metrics for the entire Sense cluster. A separate, parameterised dashboard then drill into the details for each server.
Sample dashboards are available in the Git repository.
Before importing these to Grafana you should create a Grafana data source called “SenseOps”, and point it to your InfluxDB database. When you then import the dashboards they should find your database straight away.
Overview metrics
This view gives high level insights into the 3 virtual proxies (/sales, /sourcing, /finance) in this particlar Sense environment, as well as top-level numbers on users and sessions.
Top level metrics
Low memory alerts can be set (using Grafana’s alert feature). Such alerts can be sent (using features built into Grafana) as notifications to Slack, Teams, Pager Duty, as email etc.
To keep the dashboard nice and clean it’s usually a good idea to put alert charts in their own section (see below for an example).
Apps in memory
From a sysadmin perspective it’s often interesting to know what apps are loaded into memory on each Sense server.
Here you get the details broken down by regular apps and session apps.
You can also use Grafana’s standard filtering features to narrow down on the server(s) of interest.
Apps loaded into memory
Users & sessions per server
If things really go wrong wrong in a Qlik Sense Enterprise environment there connected users might be kicked out. It is therefore important to know at any given time how many users are connected, and be able to detect sudden drops in user count.
Another use case could be for maintenance windows: You then want to know how many - and which - users are connected, so you can send them a message that maintenance is about to start.
Users and sessions per server
Warnings & Errors
This information is available in the standard Operations Monitor app in Qlik Sense Enterprise, but only in a retrospective way.
Having access to it in close to real time makes it possible to act on developing issues quicker.
Charts provide overview while tables then give the actual messages, as they appear in the log files.
Error and warning charts
Error and warning tables
Butler SOS metrics
Butler SOS is very robust indeed, but it may still be of interest to track its memory use, to make sure there aren’t any memory leaks etc.
Butler SOS memory usage
Alerts
While it’s perfectly possible to include alerts in almost any Grafana chart, sometimes its nice to tuck the alert-enabled charts away, out of sight. They will do their job and alert when needed.
Alerts
2.4 - Qlik Sense monitoring using Grafana 6
Probably the most obvious and common use case for Butler SOS. View Qlik Sense and Windows operational metrics in great looking Grafana dashboards.
Grafana is an increadibly capable tool for showing time series data.
The dashboards shown here are thus just examples and inspiration - feel free to extend and adapt these to meet your particular needs. There are also plenty of sample Grafana dashboards out there to get inspiration from.
If you experience issues with the Grafana dashboards included in the Butler SOS release on Github, you might want to try upgrading to a later/latest Grafana version.
This is a top use case for Butler SOS.
These kind of dashboards give you detailed insights into several important metrics for your Sense servers:
CPU load
Amount of free RAM memory
Number of sessions in total
Success rate of the Qlik engine’s cache
Number of loaded apps in the Qlik engine
A concept that has proven useful many times is to use an overview dashboard to monitors high-level metrics for the entire Sense cluster. A separate, parameterised dashboard then drill into the details for each server.
Sample dashboards are available in the Git repository.
Before importing these to Grafana you should create a Grafana data source called “SenseOps”, and point it to your InfluxDB database. When you then import the dashboards they should find your database straight away.
Overview dashboard
An overview dashboard could look something like this:
RAM usage for each server
Low memory alerts can be set (using Grafana’s alert feature). Such alerts can be sent (using features built into Grafana) as notifications to Slack, Teams, Pager Duty, as email etc.
General/high level user sessions info for both whole system and each server
You can also get very detailed sessions metrics, down to the level of individual sessions per server and virtual proxy.
One possible use case for this information is to see what users will be affected by a pending server reboot. You could even use this information to send a chat message to these users, informing them that their connection to Sense will be lost in x minutes. This feature is not available in Butler SOS out of the box, but is quite possible to implement if needed.
As of Butler SOS v5.0.0, detailed user session metrics stored in a fairly comprehensive way in InfluxDB. The visualisation of these metrics is still kind of rough though. The charts below can serve as inspiration, but can surely be improved upon..
Detailed user sessions info per server and virtual proxy
When something breaks in a Qlik Sense environment the logs immediately fill up with warning and/or error messages. By keeping track of these it’s easy to quickly spot (and get notified) issues when they first occur:
Detailed user sessions info per server and virtual proxy
Note how there are lots of INFO level messages generated (note the y axis scales in the diagram above!).
In a production setting it’s usually a good idea to turn off extraction of INFO level log messages into InfluxDB.
This example shows how to use Butler SOS to count user and log events received from one or more Qlik Sense servers.
Overview
Goal
The goal is to count how many user and log events are received from one or more Qlik Sense servers.
This can be useful for several reasons:
To get a general feeling if the Qlik Sense environment is healthy or not. A sudden increase in the number of warnings, errors or fatals can be an early warning sign that something is wrong. The event counters can be seen as a kind of heart beat of the Qlik Sense environment. Too fast - or too slow - can be a sign of trouble.
The counters count all log and user events sent from the Qlik Sense servers to Butler SOS. They can be used to confirm tha the Qlik Sense servers are indeed sending events to Butler SOS at all. Other, more detailed event data can then be used to drill down into the details of what is happening in the Qlik Sense environment.
A Grafana dashboard is used to visualize the event counters:
Prerequisites
Butler SOS 11.0 or later. Downloads are available here.
Store the data collected by Butler SOS in an InfluxDB v1 or v2 database. Setup instructions here
XML appender files deployed on the Sense servers you want to monitor. The appender files tell Sense to send log events to Butler SOS via UDP messages. Setup instructions here.
A reasonably recent version of Grafana. At the time of writing, Grafana 11.2 is the latest version.
Data connetions set up in Grafana to the InfluxDB database where Butler SOS stores its data.
Configure Butler SOS
Info
InfluxDB is the only supported database for this feature.
It is configured elsewhere in the YAML config file, more info here.
# Shared settings for user and log events (see below)qlikSenseEvents:# Shared settings for user and log events (see below)influxdb:enable:true# Should summary (counter) of user/log events, and rejected events be stored in InfluxDB?writeFrequency:20000# How often (milliseconds) should event counts be written to InfluxDB? ......# Log events are used to capture Sense warnings, errors and fatals in real time# Shared settings for user and log events (see below)qlikSenseEvents:# Shared settings for user and log events (see below)influxdb:enable:true# Should summary (counter) of user/log events, and rejected events be stored in InfluxDB?writeFrequency:20000# How often (milliseconds) should rejected event count be written to InfluxDB? eventCount:# Track how many log and user events are received from Sense.# Some events are valid, some are not. Of the valid events, some are rejected by Butler SOS# based on the configuration in this file. enable:true# Should event count be stored in InfluxDB?influxdb:measurementName:event_count# Name of the InfluxDB measurement where event count is storedtags:# Tags are added to the data before it's stored in InfluxDB- name:qs_tag1value:somevalue1- name:qs_tag2value:somevalue2
Configure Grafana
Total count of user and log events received from two Qlik Sense servers
The upper left panel of the Grafana dashboard at the top of this page is defined as below.
Note the two-layered query.
It is needed to get the data from the InfluxDB database in a format that Grafana can use.
There are also a couple of transformations applied to the data:
Count of user and log events per host
The center left panel of the Grafana dashboard at the top of this page is defined as below.
The query is very similar to the one above, but with a different grouping.
Transformations…
Count of user and log events per event type
The lower left panel of the Grafana dashboard at the top of this page is defined as below.
Transformations…
Table of received events
The table at the right of the Grafana dashboard at the top of this page is defined as below.
Transformations…
Next steps
4 - Monitor Qlik Sense engine performance in Grafana
The combination of Butler SOS and Grafana provides a powerful combo for monitoring the performance of your Qlik Sense engines.
Here are some recipies for how to set up Butler SOS to implement various specific monitoring scenarios.
4.1 - Track how long it takes to open Sense apps
This example shows how to use Butler SOS to track how long it takes to open apps on a Qlik Sense server.
Overview
Goal
The goal is to show how to use Butler SOS to track how long it takes to open apps on a Qlik Sense server.
This is useful for several reasons:
It provides baseline for how long it takes to usually open apps on your server, and allows you to track changes over time.
It can help identify apps that are slow to open, and then investigate why.
Add Grafana based alerts to get notified when the time to open some app(s) exceeds a certain threshold.
If end users complain about slow app opening times, the concept in this tutorial can be used to verify if there actually is a problem, and if so, where and how severe it is.
A Grafana chart showing this data could look something like this:
Prerequisites
Butler SOS 11.0 or later. Downloads are available here.
Store the data collected by Butler SOS in an InfluxDB v1 or v2 database. Setup instructions here
XML appender files deployed on the Sense servers you want to monitor. The appender files tell Sense to send log events to Butler SOS via UDP messages. Setup instructions here.
A reasonably recent version of Grafana. At the time of writing, Grafana 11.2 is the latest version.
Data connetions set up in Grafana to the InfluxDB database where Butler SOS stores its data.
Configure Butler SOS
Enable engine performance monitoring.
Enable tracking of rejected performance log events.
Butler SOS will track two things related to rejected events: Number of events (=counter), and the processing time for these events.
If detailed engine performance log events are enabled, Butler SOS will not create counters for these events. Instead, it will store the detailed data in InfluxDB.
This data can then be aggregated to create the same kind of counters as for rejected engine performance log events.
Note: “Rejected events” are engine performance log events that do not meet the criteria set up in Butler SOS for detailed engine performance monitoring.
Bottom line is that you can disable all detailed engine monitoring in Butler SOS, and only enable the tracking of rejected events.
This will give you a counter for the number of rejected events, and the processing time for these events, across a set of dimensions - of which engine “method” is one.
Or you can enable detailed engine performance monitoring for some apps, and only track rejected events for others.
Either way it will be possible to create Grafana charts that show the average time it takes to open apps on your Sense server(s).
Info
InfluxDB is the only supported database for this feature.
It is configured elsewhere in the YAML config file, more info here.
# Shared settings for user and log events (see below)qlikSenseEvents:# Shared settings for user and log events (see below)influxdb:enable:true# Should summary (counter) of user/log events, and rejected events be stored in InfluxDB?writeFrequency:20000# How often (milliseconds) should event counts be written to InfluxDB? ......rejectedEventCount:# Rejected events are events that are received from Sense, that are correctly formatted, # but that are rejected by Butler SOS based on the configuration in this file. # An example of a rejected event is a performance log event that is filtered out by Butler SOS.enable:true# Should rejected events be counted and stored in InfluxDB?influxdb:measurementName:rejected_event_count# Name of the InfluxDB measurement where rejected event count is stored......# Log events are used to capture Sense warnings, errors and fatals in real timelogEvents:udpServerConfig:serverHost:0.0.0.0# Host/IP where log event server will listen for events from SenseportLogEvents:9996# Port on which log event server will listen for events from Sensetags:- name:qs_tag1value:somevalue1- name:qs_tag2value:somevalue2......enginePerformanceMonitor:# Detailed app performance data extraction from log eventsenable:true# Should app performance data be extracted from log events?appNameLookup:# Should app names be looked up based on app IDs?enable:truetrackRejectedEvents:enable:true# Should events that are rejected by the app performance monitor be tracked?tags:# Tags are added to the data before it's stored in InfluxDB- name:qs_tag1value:somevalue1- name:qs_tag2value:somevalue2
Configure Grafana
Create a Grafana chart (in an existing dashboard or a new one) thats shows the average time it takes to open apps on your Sense server(s).
You do this by only show data in the chart for the engine method Global::OpenApp.
The definition of such a chart in Grafana could look something like this (slightly different set of apps selected, compared to screenshot above):
Note how the tag method is set to Global::OpenApp. This is the engine method that is called when an app is opened in Qlik Sense, it covers the entire time it takes to open an app.
For apps currently not in memory, this will include the time it takes to load the app from disk (which can be significant for large apps).
For apps already in memory, the time for this method will be much shorter.
Also note how the app_name tag is used to filter out only the apps you are interested in, by matching it to the app_name_unmonitored variable.
See below for detailed on how to set up this variable.
Grafana variables
By using Grafana variables, you can easily change which apps you want to show data for in the chart.
Here is how you can set up a variable in Grafana that allows you to select which apps to show data for (see above too, for how to use this variable in a chart):
Next steps
Using the concept in this tutorial, but with a slightly different chart definition in Grafana, it is possible to visualise the mean time for each method that Butler SOS tracks.
Combine with a app-selection filter variable in Grafana, and you can easily switch between different apps to see how long the various engine operations take in each of them:
4.2 - Find charts that are slow to update
This example shows how to use Butler SOS to monitor which parts of a Qlik Sense app that are slow to update.
Overview
Goal
The goal is to monitor how long each app object in a Sense app takes to open, and also identify which objects are slow to update.
This is useful for several reasons:
Identify misbehaving charts, tables etc in important apps. This can be especially important for apps that are used by many users, or that are in some way business critical.
To get a baseline for how long it takes for each app object (chart, table, …) to open and update. This can then be used to set thresholds for alerts, or to identify when performance degrades over time.
An end user may experience slow performance when interacting with an app. By monitoring the time it takes for each object to both open and update when selections are done, you can identify which objects are slow to update.
A Grafana chart showing this data for the app “Training - Field indexing” could look something like this:
Prerequisites
Butler SOS 11.0 or later. Downloads are available here.
Store the data collected by Butler SOS in an InfluxDB v1 or v2 database. Setup instructions here
XML appender files deployed on the Sense servers you want to monitor. The appender files tell Sense to send log events to Butler SOS via UDP messages. Setup instructions here.
A reasonably recent version of Grafana. At the time of writing, Grafana 11.2 is the latest version.
Data connetions set up in Grafana to the InfluxDB database where Butler SOS stores its data.
A way to get the ID of each app object in the Sense apps you want to monitor. This can be done in several ways.
The app being monitored
The “Training - Field indexing” app is used as an example in this tutorial.
It’s a basic app that is used to explain to Sense developers how in-app data indexing works in Qlik Sense.
The consists of a single table with two fields, with 10 million rows of random data:
The “LongStrings” field contains random strings of 32 characters each.
The idea of the app is to stress the Sense engine by having it index random data - something that can be done, but also something that will be slower and slower as the number of rows or the length of the “LongStrings” field increases.
There are two things that may require a lot of time to update in this app:
Opening the app when it is not already in memory. This will trigger an indexing of the data, which takes time given the randomness of the data and the number of rows.
Making a selection in the app once it is open. This will trigger a re-indexing of the selected data, which can also take some time (depending on what/how much data is selected).
A word of caution
Enabling detailed performance monitoring can generate a lot of data, especially if it is enabled for many apps.
This can lead to a large amount of data being stored in InfluxDB, which in turn can lead to performance issues in InfluxDB and Grafana.
It is therefore recommended to only enable detailed performance monitoring for a limited number of apps, and possibly only for a limited time period.
For example, it can be useful to start off by monitoring all app objects in an app to begin with, and then limit the monitoring to only a few objects that are of special interest.
It’s also possible to limit the monitoring to only certain app object types, such as charts and tables, and not other object types.
Configure Butler SOS
Info
InfluxDB is the only supported database for this feature.
It is configured elsewhere in the YAML config file, more info here.
# Shared settings for user and log events (see below)qlikSenseEvents:# Shared settings for user and log events (see below)influxdb:enable:true# Should summary (counter) of user/log events, and rejected events be stored in InfluxDB?writeFrequency:20000# How often (milliseconds) should event counts be written to InfluxDB? ......# Log events are used to capture Sense warnings, errors and fatals in real timelogEvents:udpServerConfig:serverHost:0.0.0.0# Host/IP where log event server will listen for events from SenseportLogEvents:9996# Port on which log event server will listen for events from Sensetags:- name:envvalue:DEV- name:foovalue:bar......enginePerformanceMonitor:# Detailed app performance data extraction from log eventsenable:true# Should app performance data be extracted from log events?appNameLookup:# Should app names be looked up based on app IDs?enable:true......monitorFilter:# What objects should be monitored? Entire apps or just specific object(s) within some specific app(s)?# Two kinds of monitoring can be done:# 1) Monitor all apps, except those listed for exclusion. This is defined in the allApps section.# 2) Monitor only specific apps. This is defined in the appSpecific section.# An event will be accepted if it matches any of the rules in the allApps section OR any of the rules in the appSpecific section.allApps:enable:false# Should all apps be monitored?appExclude:# What apps should be excluded from monitoring?# If both appId and appName are specified, both must match the event's data for it to be considered a match.t objectType:allObjectTypes:false# Should all object types be monitored?allObjectTypesExclude:# If allObjectTypes is set to true, the object types in this array are excluded from monitoring. # someObjectTypesInclude (below) is ignored in that case.someObjectTypesInclude:# What object types should be included in monitoring?# Only applicable if allObjectTypes is set to false.method:allMethods:false# Should all methods be monitored?allMethodsExclude:# If allMethods is set to true, the methods in this array are excluded from monitoring.# someMethodsInclude (below) is ignored in that case.someMethodsInclude:# What methods should be included in monitoring?# Only applicable if allMethods is set to false.appSpecific:enable:true# Should app specific monitoring be done?app:- include:# What apps should be monitored?# If both appId and appName are specified, both must match the event's data for it to be considered a match.- appName:Training - Field indexingobjectType:allObjectTypes:true# Should all object types be monitored?allObjectTypesExclude:# If allObjectTypes is set to true, the object types in this array are excluded from monitoring. # someObjectTypesInclude (below) is ignored in that case.someObjectTypesInclude:# What object types should be included in monitoring?# Only applicable if allObjectTypes is set to false.appObject:allAppObjects:true# Should all app objects be monitored?allAppObjectsExclude:# If allAppObjects is set to true, the app objects in this array are excluded from monitoring.# someAppObjectsInclude (below) is ignored in that case.someAppObjectsInclude:# What app objects should be included in monitoring?# Only applicable if allAppObjects is set to false.method:allMethods:true# Should all methods be monitored?allMethodsExclude:# If allMethods is set to true, the methods in this array are excluded from monitoring.# someMethodsInclude (below) is ignored in that case.someMethodsInclude:# What methods should be included in monitoring?# Only applicable if allMethods is set to false.
Configure Grafana
With the above Butler SOS configuration in place (remember to restart Butler SOS after making changes to its configuration file), you can now set up Grafana to visualize the data.
Here is a chart that shows the average time it takes for each app object in the “Training - Field indexing” app to open and update (the chart shown at the top of this page):
Grafana variables
The chart repeats itself over the “app_name_monitored” Grafana variable, which means one chart is shown for each selected app in that variable filter (at the top of the Grafana dashboard).
The variable is set up like this:
Next steps
Given the detailed information captured about all app objects in the “Training - Field indexing” app, it is possible to slice and dice the data in many ways.
For example, it could be interesting to see what kinds of app objects use most time, and which ones are faster.
Something like this: