This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Concepts

Deep dive into the various components that make up Butler SOS.

1 - Sessions & Connections

What are user sessions and why are they important?

Who are using the Sense environment right now?

This is a very basic question a sysadmin will ask over and over again.
This question is answered in great detail by the excellent Operations Monitor app that comes with Qlik Sense out of the box.
But that app gives a retrospective view of the data - it does not provide real-time insights.

Butler SOS focus on the opposite: Give as close to real-time insights as possible into what’s happening in the Sense environment.

User behaviour is then an important metric to track, more on this below.

Proxy sessions

A session, or more precicely a proxy session starts when a user logs into Sense and ends when the user logs out or the session timeout is reached.
There may be some additional corner-case variants, but the above is the basic, high-level definition of a proxy session.

Tip

Some useful lingo: The events that can happen for sessions are that they can start and stop.

They can also timeout if the user is inactive long enough (the exact time depends on settings in the QMC).

As long as a user uses the same web browser and doesn’t use incognito mode or similar, the user will reuse the same session for all access to Sense.
On the other hand: If a user uses different browsers, incognito mode etc, multiple sessions will be registered for the user in question. There is a limit to how many sessions a user can have at any given time.

Mashups and similar scenarios where Sense objects are embedded into web apps may result in multiple sessions being created. Whether or not this happens largely depends on how the mashup was created.

A good overview of what constitutes a session is found here.

Proxy session vs engine sessions

The proxy service is responsible for handling users connecting to Qlik Sense.

When a user connects a proxy session is created - assuming the user has permissions to access Sense.

When interacting with individual charts in a Sense app, engine sessions are used to interact with the engine service.

A user typically thus have one (or a few) proxy session and then a larger number of transient engine sessions.

More info here.

Connections

A user may open more than one connection within a session.

A connection is opened when the user opens an app in a browser tab, or when a user opens a mashup which in turn triggers a connection to a Sense app to be set up.
Closing the browser tab will close the connection.

Tip

Some useful lingo: The events that can happen for connections are that they can be opened and closed.

User events

Butler SOS offers a way to monitor individual session and connection events, as they happen.

This is done by Butler SOS hooking into the logging framework of Qlik Sense, which will notify Butler SOS when users start/stop sessions or connections are being open/closed.

A blacklist in the main config file provides a way to exlcude some users (e.g. system accounts) from the user event monitoring.

Configuration of user events monitoring is done in the main config file’s Butler-SOS.userEvents section.
A step-by-step instruction for setting up user event monitoring is available in the Getting started section. More info about the config file is available here.

On an aggregated level this information is useful to understand how often users log in/out, when peak ours are etc. On a detailed level this information is extremely useful when trying to understand which users had active sessions when some error occured.
Think investigations such as “who caused that 250 GB drop in RAM we were just alerted about?”.

User events in Grafana dashboard

2 - Apps

“Apps”, “Applications” or “Documents” - in Qlik lingo they all refer to the same thing.
It’s where data and logic is kept in the Qlik Sense system.

Tip

The use cases page contains related information that may be of interest.

App metrics

When it comes to apps we basically want to track which apps are loaded into the monitored Sense server(s).

In-memory vs loaded vs active apps

Given the caching nature of Qlik Sense, some apps will be actively used by users and some will just be sitting there without any current connections.

Sense group apps into three categories, which are all available in Butler SOS:

  • Active apps. An app is active when a user is currently performing some action on it.
  • Loaded apps. An app that is currently loaded into memory and that have open sessions or connections.
  • In-memory apps. Loaded into memory, but without any open sessions or connections.

Butler SOS keeps track of both app IDs and names of the apps in each of the three categories.

More info on Qlik help pages.

App names are tricky

Qlik Sense provides app health info based on app IDs.
I.e. we get a list of what app IDs are active, loaded and in-memory.

That’s all good, except those IDs don’t tell us humans a lot.
We need app names.

We can get those app names by simply asking Sense for a list of app app IDs and names, but there are a couple of challenges here:

  1. This query to Sense may not be complex or expensive if done once, but if we do it several times per minute it will create an unwanted load on the Sense server(s).
  2. There is the risk of an app being deleted between we get health data for it and the following appID-name query. Most apps are reasonably long-lived, but the risk is still there.

Right or wrong, Butler SOS takes this approach to the above:

  1. App name queries are done on a separate schedule. In the main config file you can configure both if app names should even be looked up, and how often that lookup should happen. This puts you in control of how up-to-date app names you need in your Grafana dashboards.
  2. The risk of a temporary mismatch between app IDs and app names in Butler SOS remains, and will even get worse if you query app names infrequently. Still, this is typically a very small problem (if even a problem at all).

3 - Errors & Warnings

Warnings lead to errors.
Errors lead to unhappy users.
Unhappy users call you.

Wouldn’t you rather get the warnings yourself, before anyone else?

Sense log events

Sense creates a large number of log files for the various parts of the larger Qlik Sense Enterprise environment.
Those logs track more or less everything - from extensions being installed, users logging in, to incorrect structure of Active Directory user directories.

Butler SOS monitors select parts of these logs and provides a way to get an aggregated, close to real-time view into warnings and errors.

The log events page has more info on this.

Grafana dashboards

Butler SOS stores the log events in InfluxDB together with all other metrics.
It is therefore possible to create Grafana dashboards that combine both operational metrics (apps, sessions, server RAM/CPU etc) with warning and error related charts and tables.

Showing warnings and errors visually is often an effective way to quickly identify developing or recurring issues.

Example 1: Circular Active Directory references

Here is an example where bursts of warnings tell us visually that something is not quite right.
50-60 warnings arriving in bursts every few minutes - something is not right.
There is also a single error during this time period - what’s going on there?

We can then look at the actual warnings (=log events) to see what’s going on.

In this case it turns out to be circular references in the Active Directory data used in Qlik Sense. Not a problem with Sense per se, but still something worth looking into:

Warnings and errors from Qlik Sense in Grafana dashboard

Warnings from Qlik Sense in Grafana table

Looking at that single error then, it turns out it’s caused by a task already running when it was scheduled to start:

Errors from Qlik Sense in Grafana table

Example 2: Session bursts causing proxy overload

This example is would have been hard or impossible to investigate without Butler SOS.

The problem was that the Sense environment from time to time became very slow or even unresponsive. A restart solved the problem.

Grafana was set up to send alerts when session count increase very rapidly. Thanks to this actions could be taken within minutes from the incident occuring.

Looking in the Grafana dashboard could then show something like below.
New sessions started every few seconds, all coming from a single client. The effect for this Sense user was that Sense was unresponsive.

The log events section of the Grafana dashboard showed that this user had been given 5 proxy sessions, after which further access to Sense internal resources (engine etc) was denied.

Sessions increasing very quickly

In this case the problem was caused by a mashup using iframes that when recovering from session timeouts caused race conditions, trying to re-establish a session to Sense over and over again.

Alerts

Both New Relic and Grafana includes a set of built-in alert features that can be used to set up alerts as well as forward them to destinations such as Slack, MS Teams, email, Discord, Kafka, Webhooks, PagerDuty and others.

More info about Grafana alerts here.
New Relic alerts described here.