This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

Tip

Starting with version 9.3, the GitHub releases page is the place where release notes are found.

Release notes for older versions are found below.

What’s new in version 9.2

Features

  • Add support for storing Sense engine warning/error log messages in InfluxDB. #435
  • Specify zero or more New Relic credentials via command line options. #429
  • Support for sending metrics and events to multiple New Relic accounts. #417
  • Write info on startup about execution type. #430

Bug Fixes

  • Add missing XML log appender file for QS engine service. #433
  • Log events now correctly sent to New Relic, incl engine log events. #432

Miscellaneous

  • Remove unnecessary handling of engine performance log messages. #434
  • Updated dependencies

What’s new in version 9.1

Features

  • Better logging when warnings and errors occur. #404

Bug Fixes

  • Send correct tags to Prometheus endpoint. #422

Miscellaneous

  • Apply consistent formatting to all source and doc files. #419
  • Upgrade Prometheus metrics lib to latest version.
  • Update dependencies to latest versions.

What’s new in version 9.0

Focus of this release is twofold:

  1. Improve the usability of the stand-alone Butler SOS binaries.
  2. Send metrics and log events to the New Relic monitoring service.

⚠ BREAKING CHANGES

  • Add external memory to uptime data in InfluxDB.
    This is a minor change, but it does also require a small change in the config file.

Features

  • Add command line options to Butler SOS. #387
  • Add external memory to uptime data in InfluxDB.
  • Add New Relic as destination for SenseOps metrics.
  • Add optional scrambling of user id for user events sent to New Relic. #398
  • Send engine, proxy and session metrics to New Relic.

Bug Fixes

  • Compress stand-alone binaries.
  • Include New Relic status in telemtry data (23c292c)

Miscellaneous

  • Make proxy related log entries easier to understand.#392
  • Make user event log messages easier to understand. #396
  • More relvant log prefixes for proxy session logging.
  • Update dependencies to latest versions.

What’s new in version 8.1

  • Scanning for security risks and vulnerabilities is now done as part of the relese process. This is in addition to the daily scans that are also done (and have been done for a long time].
  • Clean up the Docker images (available on Docker Hub) in order to keep them as lean as possible.

What’s new in version 8.0

The version number change indicate this release contains breaking changes.

Well… When working on the release process for Butler SOS we happened to bump the version number too much. Oops.
No breaking changes in this version.

There’s a major new feature though:
Pre-compiled, standalone binaries for Windows, Linux and macOS!

This has been planned for ages and we finally got around to implement it.
It may look like a minor change but it does make it easier to get started with Butler SOS:

  • You no longer have to first install Node.js, then install all the Butler SOS files.
  • All you need is now a single binary and a valid YAML config file (which hasn’t changed since previous version/7.x).
  • If you want to keep using a separate Node.js engine for running Butler SOS that’s perfectly fine too.

What’s new in version 7.0

This is a major release including features that have been on the roadmap for years, but never really graduated from the concept phase.
Until now, that is.

The big thing in v7.0 is the addition of a generic way to handle QSEoW warning and error events.
These used to be written to both log files and the Sense log database, but with log db gone those events only exist in the log files.

Butler SOS can now handle these events and store them in InfluxDB or re-publish them as MQTT messages.
Grafana dashboards attached to InfluxDB gives you close to a real-time view into Sense errors and warnings.

Or simply:
Get the most important messages from the Sense log files sent in real-time to Butler SOS - and from there out into the world as needed.

⚠️ NOTE: Because of the significant nature of the changes below, this version includes some breaking changes.

  • ⚠️ Added support for dealing with individual QSEoW log events.
    Initially warning and error events from the proxy, scheduler and repository services are sent to Butler SOS.
  • MQTT topics matching the QSEoW subsystem where log events originated.
    When sending log events to MQTT, there’s an option to send each log event to a MQTT subtopic corresponding to the QSEoW subsystem the event originated from. This is useful for 3rd party systems that want to detect (and probably take action based on) very specific QSEoW log events.
    For example, a log event from the Service.Scheduler.Scheduler.Master.Task.TaskSession subsystem in QSEoW would be posted to the topic <some root topic>/service/scheduler/scheduler/master/task/tasksession.
  • ⚠️ Improved handling of user activity events.
    These can now be sent to zero or more different MQTT topics, each covering a specific kind of user activity (start/stop session, open/close connection etc).
  • ⚠️ Take the first step towards removing support for QSEoW log db in Butler SOS.
    There is no date set yet when log db support will be removed, but at some point that’s likely to happen.
    • In this release the previously existing InfluxDB measure log_event has been renamed to log_event_logdb.
    • The new log events introduced in v7.0 will be stored in the log_event message going forward.
  • The later library is no longer maintained and has been replaced by @breejs/later.
  • All libraries used by Butler SOS updated to latest versions.
  • Documentation site updated with respect to v7.0.

What’s new in version 6.0

First, while the switch to 6.0 indicate there are breaking changes, that’s not entirely true.

There are however significant changes to many parts of the code base.

The code is now better, more modern, scalable and in general better structured.
Plenty of testing has been done, but in order to highlight that lots of changes were done, we decided to move to 6.0 rather than 5.7.

  • Added Prometheus support.
    Most Butler SOS metrics are now exposed on a Prometheus-friendly endpoint. They can thus easily be scraped by and ingested into Prometheus.
    Once in Prometheus the Qlik Sense metrics can be visualised using Grafana (just as before when using InfluxDB), but also used in very capable alerting scenarios. Prometheus has great integrations with many incident management tools, for example.
  • The Prometheus endpoint also include general Node.js metrics that can be used to monitor Butler SOS itself.
  • Switched process for doing releases of Butler SOS.
    Things are now more automated which should result in more predictable and consistent version numbers. The changelog file will be auto-generated going forward (it will start from version 5.6.0, older version still available as changelog_old.md).

What’s new in version 5.6

  • Added user event monitoring. Up until now this has been a feature of Butler, but as this feature is very much within the domain covered by Butler SOS, it’s moving here instead.
    The events monitored are session start/stop (typically users logging in/out/timeout) and connection open/close (typically an app being opened/closed in a browser tab).
  • Added a blacklist for user sessions. If a user is added to the blacklist, the detailed session data for that user will not be saved to InfluxDB.
    The user will still be included in the session summary metrics and count towards the total number of sessions, at any given time.
    Note: The blacklist only applies to storing detailed session data in InfluxDB, MQTT (if enabled) is not affected by the blacklist.
  • Anonymous telemetry added. Same set of data included in other Butler tools, i.e. only information about what the execution environment of Butler SOS looks like and which features are enabled. The rationale for adding telemetry is to give Butler SOS developers a better understanding of on what kinds of servers the software is used. This insight will make it easier to develop future Butler SOS versions.
  • Continuing the journey towards using the same formatting principles in the config files for all the various Butler tools.
    In Butler SOS’ config file there’s been a mix between “enabled”, “enable”, “enableMqtt” etc to tell whether a certain feature should be enabled or not. Confusing. We’re moving towards only using “enable” for this. This release changes this for most config file entries. Rest assured though, the old format will still work - but you are strongly recommended to adapt the current config file format as it includes settings for other new (as of version 5.6) features too.
  • Major refactoring of the documentation site butler-sos.ptarmiganlabs.com. This site is now more aligned with other Butler sites, for example butler.ptarmiganlabs.com.
  • Various bug fixes, performance improvements and fixed typos.

What’s new in version 5.5

  • Docker images for various Arm architectures are now created as part of the standard release process.

What’s new in version 5.4

This video gives an idea of what Butler SOS is capable of.



Highlights of version 5.4 include:

  • Sample dashboards are now built using the brand new, shiny and all together awesome Grafana 7. Did we mention that Grafana 7 is awesome? Awesome.
  • Ever wondered how long Butler SOS has been running or how much memory it uses? The new uptime messages have you covered.
  • You are properly impressed with the uptime messages - good. Why not store them to Influxdb, so you can also visualize Butler SOS’ own memory use? It’s just a couple of changes in the config file away.
  • Don’t want to use the Docker healthchecks? No reason to if you don’t user Docker. You can now turn it off in the config file.
  • Ah, you are a serious Sense user and have separate DEV and PROD environments? Good - now Butler SOS tags its own memory use so you can monitor each Butler SOS instance separately.
  • Who will monitor the monitor? Butler SOS can now send heartbeats to customizable URLs at desired intervals. Perfect if you want to monitor Butler SOS using for example healthchecks.io. Very, very cool actually.
  • Bugs, bugs and bugs. The known ones have been fixed. Keep reporting new ones!
  • Update all dependencies to latest versions, to ensure security concerns are adressed.

Releases are available on Github.

1 - About

Information about the Butler SOS software, community, docs and more.

Are you stuck on something while setting up Butler SOS? Got ideas for new features?
Don’t hesitate to post your thoughts in the Butler SOS forums.

1.1 - Butler SOS

An introduction to Butler SOS.

The Butler SOS project is about adding best-in-class monitoring to the client-managed Windows version of Qlik Sense Enterprise, also known as QSEoW (Qlik Sense Enterprise on Windows).

The goal is to provide a close to real-time view into what’s happening in a Qlik Sense environment.

At different times different metrics will be of interest.
For that reason Butler SOS stores all metrics from Sense in a time-series databse (InfluxDB and Prometheus both supported), from which dashboards, reports etc can then be created using tools such as Grafana.
Grafana is an open source, world-class visualisation tool for time series data. It also has great alerting features and integrate with all kinds of alerting solutions and IM tools.

If you don’t fancy InfluxDB or Prometheus, Qlik Sense metrics and events can also be sent to New Relic for storage and visualisation.
They offer a free tier that will go a long way towards testing out a cloud-based visualisation solution for Butler SOS.

Metrics are a major component of operational monitoring, but it’s also important to keep on top of what errors and warning occur in the system.
As of late 2021 the log database is no longer part of QSEoW, with the log files the only place where you can find those errors and warnings.

Butler SOS address this by sending log events in real-time from QSEoW to Butler SOS, which then stores them in InfluxDB, and/or send them to New Relic, and/or re-publish them as MQTT messages.
This basically means you will very rarely have to plow through endless log files to find information about warnings and errors that have occured.

There is also a clear goal that Butler SOS should be very configurable.
In practice this means that features can be turned on/off as needed, improving security and lowering memory usage.

Butler SOS is written in Node.js and runs on most modern operating systems.
Stand-alone binary files are created for Windows, Linux and macOS.

You can run Butler SOS on the same server as Qlik Sense, in a Docker container on a Linux server, in Kubernetes, on Mac OS, on Raspberry Pi (not a good idea.. but possible and proven to work).

Butler SOS is a member of a group of tools collectively referred to as the “Butler family”, more info is available here.

1.2 - The Butler family

Please meet the Butlers. They’re a nice, wild bunch!

Butler started out with a very specific need to start Sense reloads from outside systems.
Over the years a few projects (for example Butler SOS, which simplifies day 2 operations ([1], [2]) have spun off from the original Butler project, and still other projects have been created from scratch to solve specific challenges around developing Sense apps and running Qlik Sense server environments.

All members of the Butler family are available on Ptarmigan Labs’ GitHub page.

Projects with production grade release status are (as of this writing):

Butler

The original Butler. Offers various utilities that make it easier to develop Sense apps, as well as simplifying day 2 operations.

butler.ptarmiganlabs.com.

Butler SOS

Real-time operational metrics for Qlik Sense.
A must-have if you are responsible for a Sense environment with more than a dozen or so users.

Butler SOS makes it possible to detect and alert on issues as they happen, rather than in retrospect much later.

butler-sos.ptarmiganlabs.com. (This site!)

Butler Sheet Icons

Automates the creation of sheet icons for Qlik Sense Enterprise on Windows (QSEoW) applications.

It’s a cross platform command line tool which given the correct Sense credentials will take screen shots of all sheets in a Sense app (or all apps on a Sense server!), then create thumbnail versions of those screenshots.
Finally those thumbnails will be set as sheet icons.

No more manual screenshot taking, resizing images, navigating hundreds of sheets in dozens of apps.
Start Butler Sheet Icons instead and go get a nice fika.

The tool can be used stand-along or as part of an automated release process.

https://github.com/ptarmiganlabs/butler-sheet-icons

Ctrl-Q

Given the name of this tool it doesn’t sound like a member of the Butler family.
Let’s say Ctrl-Q is a sibling of the Butler bunch.

While the Butler tools are (usually) intended to solve and simplify rather specific use cases, Ctrl-Q is aimed at being the lazy Qlik developer’s best friend.

Let’s say there is some manual, tedious, time consuming and error prone activity that a Qlik Sense developer is faced with.
For example importing dozens of apps from QVF files and creating a hundred associated reload tasks.
Ctrl-Q lets you do this with a single command, using definitions in an Excel file. Instead of spending a day on this the actual execution takes a minute or so.

In other words: Ctrl-Q focus on high-value use cases that are difficult or impossible to solve using other tools.

github.com/ptarmiganlabs/ctrl-q

Butler CW

Butler Cache Warmer. Cache warming is the process of proactively forcing Sense apps to be loaded into RAM, so they are readily available when users open them.
Using Butler CW is an easy way to make your end users’ experience of Sense a little better.

github.com/ptarmiganlabs/butler-cw

Butler App Duplicator

No matter if you are a single developer creating Sense apps, or have lots of developers doing this, having app templates is a good idea:

  • Lowered barrier of entry for new Sense developers.
  • Productivity boost when developing Sense apps.
  • Encouraging a common coding standard across all apps.

github.com/ptarmiganlabs/butler-app-duplicator

Butler Spyglass

This tool is mainly of interest if you have lots of QVDs and apps, but when that’s the case it’s of paramount importance to understand what apps use which QVDs. In other words what data lineage looks like.

Butler Spyglass also extracts full load scripts for all Sense apps, creating a historical record of all load scripts for all Sense apps.

github.com/ptarmiganlabs/butler-spyglass

Butler Notifier

This tool makes it easy to tap into the Qlik Sense notification API. From there you can get all kinds of notifications, including task reload failures and changes in session state (user login/logout etc).

github.com/ptarmiganlabs/butler-notifier

Butler Icon Uploader

Visual looks is important when it comes to analytics, and this holds true also for Sense apps.

The Butler Icon Uploader makes it easy to upload icon libraries (for example Font Awesome) to Qlik Sense Enterprise. With such icons available it is then easy for app developers to use professional quality sheet and app icons in their Sense apps.

github.com/ptarmiganlabs/butler-icon-upload

1.3 - Use cases

How can Butler SOS be used?

A complete list of metrics is available in the reference section.

Monitor Qlik Sense metrics

This is the main use case for Butler SOS, with a large number of monitored metrics.
Butler SOS can be configured to store these metrics in InfluxDB/Prometheus and/or send them as MQTT messages.

Session count

A session (or more specifically, a “proxy session”) is created when a user logs into Sense.
The session is typically reused when a user opens additional Sense apps from the same browser.
Knowing how many users are logged at any given time gives a Sense admin an understanding of when peak hours are, when service windows should be planed, whether the server(s) is too small or too big etc.

The session metrics are arguably among the most important ones provided by Butler SOS

Applications

Butler SOS tracks how many and what applications (IDs and app names) are in loaded into the monitored Qlik engine(s). In-memory and active applications are also tracked in the same way.

These metrics provide useful insights into the degree to which loaded apps are actually used and to what degree caching is in effect.

A couple of bonus metrics are also included: Number of calls made to the Qlik associative engine and number of selections done by users in the engine. Not really useful as such, but they do serve as a good relative measurement of how active users are between days/weeks/months.

Cache status

The caching of applications in RAM is one of Qlik’s classic selling points. But is it really working?

Butler SOS provides a set of metrics (bytes added to cache, lookups, cache hits/misses etc) that give hard numbers to the question of whether the cache is working or not.

While this is probably not as interesting from an operational perspective as user session counts, RAM usage and errors/warnings from the Sense logs, the cache metrics should definitely be monitored over medium timespans.

For example, if the cache hit ration goes down over weeks or months, that could mean a poorer user experience over that same time period.

Monitor server metrics

Some basic server metrics (free RAM and CPU load) are monitored by Butler SOS. You may have other, dedicated server monitoring tools too - Butler SOS does not replace these.
It’s however often convenient to have both server and Sense metrics side by side, thus Butler SOS includes some of the more important server metrics in addition to the Sense ones.

These metrics are only stored in InfluxDB/Prometheus, i.e. not sent as MQTT messages.

Available memory/RAM

As Qlik Sense is an in-memory analytics tool you really want to ensure that there is always available memory for users’ apps.
If your Sense server runs out of memory it’s basically game over. Now, Sense usually does a very good job reclaiming unused memory, but it’s still critically important to monitor memory usage.

One error scenario that’s hard or impossible to catch without Butler SOS style monitoring is that of apps with cartesian products in them.
They can easily consume tens of hundreds GByte of RAM within seconds, bringing a Sense server to a halt.
Butler SOS has more than once proven its value when debugging this specific issue.

CPU load

If a server is heavily loaded it will eventually be seen as slow(er) by end users, with associated badwill accumulating.

Log events: Qlik Sense errors & warning

The Sense logs are always available on the Sense servers, the problem is that they are hard to reach there - at least in real time.
Retrospective analysis is also cumbersome, you basically have to manually dig up the specific log files of interest and then search them for the information of interest.

Butler SOS simplifies this greatly by having select log events (warnings, errors and fatals by default) sent from the Sense servers to Butler SOS.
Once such a log event message arrives, Butler SOS will store it in its database (for example InfluxDB or New Relic), from where the log event can be visualised using Grafana or within New Relic.

Log events are also re-published as MQTT messages. This makes it possible for 3rd party systems to trigger actions when certain log events occur in Qlik Sense.

Log events from several Qlik Sense servics can selectively be forwarded to Butler SOS:

  • Engine
  • Proxy
  • Repository
  • Scheduler

A sample use case of log events:

  1. Create a Grafana or New Relic real-time chart showing number of warnings/errors per 5-minute window.
  2. Set up an alert in those tools to notify you when number of events during past 5 minutes go above some limit.

This is trivial to set up, but gives you a very capable, close to real-time error/warning monitoring solution.

The log database

The log database in Qlik Sense Enterprise on Windows is deprecated as of late 2021.
Butler SOS still maintains support for older QSEoW versions. At some point in the future this support is however likely to be removed.

This feature relies on Butler SOS querying log db with certain intervals (typically every few minutes).
A list of recent log events are returned to Butler SOS. The events are de-duplicated before stored in InfluxDB.

This essentially achieve the same thing as the more modern log event feature of Butler SOS - but with longer delays. The log event model is almost instantaneous whereas the log db polling will be its very nature result in non real-time data.

User activity events

Detailed events are avilable for all users:

  • Session start
  • Session stop
  • Connection open
  • Connection close

These events are both stored in InfluxDB and re-published as MQTT messages.

Usually it’s enough to track how many users are currently using the Qlik Sense system.
Exactly what users are usually of less interest.

At times you may want more detailed insights though. Then these events are increadibly useful.

For example:

  • Sometimes network issues cause some users’ browsers to start many new sessions instead of re-using existing sessions.
    This can result in the proxy service overloading and making access to Sense slow for all users.
    Butler SOS makes it easy to detect this. Just create a Grafana chart that shows number of session start events over time, attach an alert that goes off if number of new sessions per minute is too high.
  • Let’s say a specific user has troubles using Sense apps as intended, or a user is suspected of causing excessive RAM/CPU usage.
    Subscribe to the MQTT user activity messages coming from Butler SOS, filtering out just the user(s) of interest.
    Get a notification the very moment the user connects to Sense after which you can follow in real time what happens with the system. Does CPU go up? Free RAM goes down? Which apps are loaded?

You can also use the MQTT message to create your personal disco light, controlled by your Sense users connecting and dropping off the Sense server…

Green = User opening a connection to Sense
Red = User closing connection to Sense

From YouTube.

1.4 - Versions

Features are added, bugs fixed. How are Butler SOS versions set?

In the spirit of not copying information to several places, the version history is kept as annotations of each release on the GitHub release page.

Version numbers include up to 3 levels, for example version 4.6.2 (which is a fictitious version):

4 is the major version. It is increased when Butler has added major new features, or in other ways changed in major ways.
If following this principle, breaking changes should always result in a bumped major version.

6 is the minor version. This indicates a smaller update, when one or a few minor features have been added.

2 is the patch level. When individual bugs are fixed, these are released with an increased patch level.

Note 1: Major and minor updates usually include bug fixes too.
Note 2: If a version of 5.2 is mentioned, this implicitly means 5.2.0.

1.5 - Contribution guidelines

How to contribute to Butler SOS.

Butler SOS is an open source project, using the MIT license.

This means that all source code, documentation etc is available as-is, at no cost.

It however also means that anyone interested can - and is encouraged to - contribute to the project!

Butler SOS is developed in Node.js, with support from various NPM modules.

We use Hugo to format and generate this documentation site, the Docsy theme for styling and site structure.
Hugo is an open-source static site generator that provides us with templates, content organisation in a standard directory structure, and a website generation engine. You write the pages in Markdown (or HTML if you want), and Hugo wraps them up into a website.

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.

Creating an issue

If you’ve found a problem - or have a feature suggestion - with Butler SOS itself or the documentation, but you’re not sure how to fix it yourself, please create an issue in the Butler SOS repo. You can also create an issue about a specific doc page by clicking the Create Issue button in the top right hand corner of the page.

Development concepts

Some of the main tools/processes used during development of Butler SOS are:

Visual Studio Code (=VSC)

Any IDE supporting Node.js can be used, but VSC works really well. Open Source and a huge ecosystem of extensions.

GitHub

Used to store source code, track issues, change requests etc.

GitHub Actions used to build Docker images.

Release Please

Release Please is used to create release notes.
It also enforces consistent versioning when new (sometimes breaking) features are added, bugs fixed etc.

This menas thatas of Butler SOS 6.0 you can have more trust in the semantic versioning of Butler SOS releases.

ESLint + Prettier

Used to enforce a uniform source code format that also follow best practices defined in ESLint.

ESLint shows code issues within Visual Studio Code, but standalone reports can also be created:

➜ npx eslint . --format table

/Users/goran/code/butler-sos/src/butler-sos.js

║ Line     │ Column   │ Type     │ Message                                                │ Rule ID              ║
╟──────────┼──────────┼──────────┼────────────────────────────────────────────────────────┼──────────────────────╢
3118       │ error    │ Unexpected require().                                  │ global-require       ║

/Users/goran/code/butler-sos/src/lib/appnamesextract.js

║ Line     │ Column   │ Type     │ Message                                                │ Rule ID              ║
╟──────────┼──────────┼──────────┼────────────────────────────────────────────────────────┼──────────────────────╢
2937       │ error    │ A constructor name should not start with a             │ new-cap              ║
║          │          │          │ lowercase letter.                                      │                      ║

/Users/goran/code/butler-sos/src/lib/heartbeat.js

║ Line     │ Column   │ Type     │ Message                                                │ Rule ID              ║
╟──────────┼──────────┼──────────┼────────────────────────────────────────────────────────┼──────────────────────╢
31        │ error    │ Unexpected var, use let or const instead.              │ no-var               ║
61        │ error    │ Unexpected var, use let or const instead.              │ no-var               ║
621       │ warning  │ Unexpected unnamed function.                           │ func-names           ║
915       │ error    │ Unexpected function expression.                        │ prefer-arrow-        ║
║          │          │          │                                                        │ callback             ║
915       │ warning  │ Unexpected unnamed function.                           │ func-names           ║
925       │ error    │ 'response' is defined but never used.                  │ no-unused-vars       ║
1316       │ error    │ Unexpected function expression.                        │ prefer-arrow-        ║
║          │          │          │                                                        │ callback             ║
1316       │ warning  │ Unexpected unnamed function.                           │ func-names           ║
279        │ error    │ All 'var' declarations must be at the top of the       │ vars-on-top          ║
║          │          │          │ function scope.                                        │                      ║
279        │ error    │ Unexpected var, use let or const instead.              │ no-var               ║
289        │ error    │ All 'var' declarations must be at the top of the       │ vars-on-top          ║
║          │          │          │ function scope.                                        │                      ║
289        │ error    │ Unexpected var, use let or const instead.              │ no-var               ║
2813       │ error    │ 't' is assigned a value but never used.                │ no-unused-vars       ║
2835       │ error    │ Unexpected function expression.                        │ prefer-arrow-        ║
║          │          │          │                                                        │ callback             ║
2835       │ warning  │ Unexpected unnamed function.                           │ func-names           ║

╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
13 Errors                                                                                                      ║
╟────────────────────────────────────────────────────────────────────────────────────────────────────────────────╢
4 Warnings                                                                                                     ║
╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

To format all source code files:

➜  npm run format:prettier

> butler-sos@5.6.0 format:prettier
> npx prettier --config .prettierrc.yaml "./**/*.{ts,css,less,js}" --write

butler-sos.js 97ms
docker-healthcheck.js 13ms
globals.js 90ms
lib/appnamesextract.js 35ms
lib/healthmetrics.js 27ms
lib/heartbeat.js 8ms
lib/logdb.js 34ms
lib/post-to-influxdb.js 85ms
lib/post-to-mqtt.js 27ms
lib/prom-client.js 45ms
lib/servertags.js 5ms
lib/service_uptime.js 14ms
lib/sessionmetrics.js 29ms
lib/telemetry.js 18ms
lib/udp_handlers.js 18ms

1.6 - Telemetry

What’s telemetry and why is it important?

Sharing telemetry data from Butler SOS is optional.
You can use all Butler SOS features without sharing telemetry data.

That said, if you find Butler SOS useful you are strongly encouraged to leave the telemetry feature turned on.
Having access to this data greatly helps the Butler SOS developers when they design new features, fix bugs etc.

The Butler SOS developers care about you - sharing telemetry data is your way of showing you care about them.

Sharing is caring!

What’s telemetry

From Wikipedia:

Telemetry is the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment (telecommunication) for monitoring.

In the context of software tools (including Butler) telemetry is often used to describe the process of sending information about the tool itself to some monitoring system.

Why telemetry in Butler SOS?

This is a good question.

For several years there was no telemetry at all in Butler SOS.

Development of new features were driven mainly by what features were needed at the time.
Or the fact that Qlik released some new feature in Sense and Butler SOS was a way to test that new feature from the perspective of the Sense APIs.

That’s all good but today Butler SOS is a rather significant tool with features spanning monitoring, alerting and more..

This multitude of features is also one of the core reasons for adding telemetry to Butler SOS:

  • Which Butler SOS features are actually used out there?
  • Which operating systems, Node.js versions and hardware platforms is Butler SOS running on?

Without this information the Butler SOS developers will keep working in the dark, not really knowing where to focus their efforts.

On the other hand - with access to telemetry data a lot of possibilities open up for the Butler SOS developers:

  • If telemetry shows that no one uses a particular feature, maybe that feature should be scheduled for deprecation?
  • The opposite of the previous: If lots of users use a specific Butler SOS feature, then that feature is a candidate for future focus and development.
  • Telemetry will show if lots of users run Butler SOS on old Node.js versions. Knowing this its possible to set a migration schedule for what Node.js versions are supported - avoiding hard errors when some old Node.js version is no longer supported by Butler SOS.
  • Same thing for understanding what operating systems Butler SOS runs (and should be supported) on.

Configuring Butler SOS’ telemetry

Instructions here.

The details

Where is telemetry data sent

Butler SOS uses PostHog to collect telemetry data.

PostHog is an open source telemetry platform that is used by many open source projects. The data is stored in the EU.

Deleting telemetry data

Even though no-one (not even Ptarmigan Labs who runs the telemetry database!) has any way of ever connecting the data sent by your Butler SOS instance to you (it’s all anonymized, remember?), there can be cases where telemetry data must be deleted.

The legal page has more information about this.

Field level description of telemetry data

A telemetry message from Butler SOS contains the information below.

{
  "ts": "2021-04-16T13:43:02.467Z",
  "data": {
    "service": "butler-sos",
    "serviceVersion": "5.6.0",
    "system": {
      "id": "8e315a90cac0e447360697123002f23b2775cc61f93d6bc7a9138d94af057e5e",
      "arch": "x64",
      "platform": "darwin",
      "release": "11.2.2",
      "distro": "macOS",
      "codename": "macOS Big Sur",
      "virtual": false,
      "nodeVersion": "v14.15.4"
    },
    "enabledFeatures": {
      "feature": {
        "heartbeat": true,
        "dockerHealthCheck": false,
        "uptimeMonitor": true,
        "uptimeMonitor_storeInInfluxdb": true,
        "udpServer": true,
        "logdb": true,
        "mqtt": true,
        "influxdb": true,
        "prometheus": true,
        "appNames": true,
        "userSessions": true
      }
    }
  }
}

The anonymous ID field

The id field deserves a bit more explanation.

It’s purpose is to uniquely identify the Butler SOS instance - nothing else.
If Butler SOS is stopped and started agagin the same ID should be generated.

Some sensitive information is used to create the ID, but as the ID is anonymized before sent as part of the telemetry data, no sensitive information leaves your servers.

The ID field is created as follows:

  1. Combine the following information to a single string

    1. MAC address of default network interface
    2. IPv4 address of default network interface
    3. IP or FQDN of Sense server where repository service is running
    4. System unique ID as reported by the OS. Not all OSs support this though, which is why field 1-3 above are also needed to get a unique ID.
  2. Run the created string through a one-way hashing/message digest function. Butler SOS uses Node.js’ own Crypto library to create a SHA-256 hash, using the default network interface’s MAC address as salt.
    Security is increased due to the fact that the salt never leaves the server where Butler is running.

    The bottom line is that it’s impossible to reverse the process and get the IP, host name etc used in step 1 above.
    Then again - this is cryptografy and things change.
    But if you trust the certificates securing Sense itself, then the ID anonymization in Butler SOS should be ok too. Both are built on the same concepts of one-way cryptographic functions.

  3. The result is a string that uniquely identifies the Butler SOS instance at hand, without giving away any sensitive data about the system where Butler is running.

See above for an example of what the id field looks like.
The id field is always shown during Butler startup.

Telemetry FAQ

  1. What data is included in the telemetry messages?
    See above.
    The telemetry includes information about which Butler SOS features are enabled vs disabled. A unique, anonymized ID is included too, it’s unique to each Butler SOS instance and is used soley to distinguish between different Butler SOS instances.
    Finally some information about Butler SOS’s execution environment is included. Things like operating system, Node.js version used etc.

  2. Can my Sense environment be identified via telemetry data?
    Short answer: No.
    Longer answer: No information about your Sense environment is sent as part of telemetry. No IP addresses or server names, no IDs of Sense apps/tasks/etc, no information about what actual data passed through Butler SOS, or any other data that can be linked to your Sense environment is included in the telemetry data.

2 - Getting started

Taking your first steps.

Butler SOS is written in Node.js, which is a cross-platform programming environment. This means most kinds of computers and servers can be used to run Butler SOS, including Windows, Linux and Mac OS.

Setting up Butler SOS is pretty straightforward, but you do need a working understanding of Qlik Sense admin tasks.
For example, you need to export certificates from the QMC, as well as installing Butler SOS itself.

2.1 - Overview

SenseOps monitoring - what’s that?

This page provides the general steps to get started with Butler SOS.
It also explains how Butler SOS relates to other tools and services that collectively make up the SenseOps concept.

Butler SOS

Qlik Sense + DevOps = SenseOps

Butler SenseOps Stats (“Butler SOS”) is a monitoring tool for Qlik Sense, built with DevOps workflows in mind.

It publishes operational, close to real-time Qlik Sense Enterprise metrics to InfluxDB, Prometheus, New Relic and MQTT. From there it can be visualised using tools like Grafana, New Relic or acted on by downstream systems that listen to the MQTT topics used by Butler SOS.

Butler SOS gathers operational metrics from several sources, including the Sense healthcheck API and Session API.
It also pulls log events from Sense’s Postgres logging database, and forwards these to InfluxDB and MQTT.

Do I really need a tool like this?

Let’s say you are somehow involved in (or maybe even responsible for) your company’s client-managed Qlik Sense Enterprise on Windows (QSEoW) environemnt.

Let’s also assume you have more than 5-10 users in your Sense environment. Maybe you even have business critical data in your Sense apps.

Given the above, the answer is almost certainly “yes” : You can simplify your workday and provide a better analytics experience to your end users by using a tool like Butler SOS.

Why a separate tool for this?

Good question.

While Qlik Sense ships with a great Operations Monitor application, it is not useful or intended for real-time operational monitoring.
The Ops Monitor app is great for retrospective analysis of what happened in a Qlik Sense environment, but for a real-time understanding of what’s going on in a Sense environment something else is needed - enter Butler SOS.

The most common way of using Butler SOS is for creating real-time dashboards based on the data in the InfluxDB or Prometheus database, showing operational metrics for a Qlik Sense Enterprise environment.

Sample screen shots of Grafana dashboards created using data extracted by Butler SOS:

Grafana dashboard

Grafana dashboard

Grafana dashboard

As mentioned above, Butler SOS can also send data to MQTT for use in any MQTT enabled tool or system.

Known limitations & improvement ideas

Things can always be improved, of course. Here are some ideas on things for future versions:

  • The MQTT messages are kind of basic, at least when it comes to data from the Sense logs and for detailed user sessions. In both those cases a single text string is sent to MQTT. That’s fine, but assumes the downstream consumer of the MQTT message can parse the string and extract the information of interest.
    A better approach would be to send more detailed MQTT messages. Those would be easier to consume and act upon for downstream systems, but it would on the other mean lots more MQTT messages being sent.
  • Send data as Kafka messages. Same basic idea as for MQTT messages, but having the Sense operational data in Kafka would make it easier to process/use it in (big) data pipelines.

If you have ideas or suggestions on new features, please feel free to add them in the Butler SOS Github project.

Where should I go next?

Ready to move on?

Great! Here are some good starting points

  • Examples: Check out some Grafana dashboards to get inspiration what can be done!
  • Installation & setup: Learn how to install Butler SOS, then set it up according to your needs.

I have a question or want to report an issue

Feel free to reach out via GitHub discussions for general questions, GitHub issues for bugs, or by email to info@ptarmiganlabs.com.

Security / Disclosure

If you discover any important bug with Butler SOS that may pose a security problem, please disclose it confidentially to security@ptarmiganlabs.com first, so that it can be assessed and hopefully fixed prior to being exploited. Please do not raise GitHub issues for security-related doubts or problems.

Who’s behind Butler SOS?

Butler SOS is an open source project sponsored by Ptarmigan Labs, an IT consulting company in Stockholm, Sweden. The main contributor to the tool is (so far) Göran Sander from same company.

Please refer to the Contribution guidelines page for details on how to contribute, suggest features etc to the tool.

2.2 - Installing Butler SOS

The steps needed for installing and configuring vary slightly depending on what platform you use. The details are found here.

Warning

Butler SOS was developed with InfluxDB version 1.x in mind.

InfluxDB is currently available in version 2.x and while this version brings lots of new goodies, it’s not out-of-the-box compatible with Butler SOS.
For that reason you should use the latest 1.x version of InfluxDB, which at the time of this writing is 1.8.4.

In due time Butler SOS will be updated to support InfluxDB 2.x too.

Tip

There is a Tips & Tricks to get started with Butler SOS document on the Butler SOS forums.

It contains description of issues people have faced when installing Butler SOS, as well as solutions to them.

If in doubt on how to install Butler SOS, please consider Docker as the first alternative.
Why? Several reasons:

  • Very quick to get started. Usually it takes just a few minutes to set up a Butler SOS instance in Docker.
  • Using Docker is a great way to test new tools without having to install the tool on one of your actual servers. If you decide the tool in question is not for you - just delete the Docker container. Your servers remain 100% the same as before the test.
  • The previous point is true not only for Butler SOS, but also its companion tools InfluxDB, Prometheus, Grafana and MQTT (via for example the Mosquitto MQTT broker). You can run all of these tools in their own Docker containers, and not install a single piece of new, native applications during your evaluation of Butler SOS.
  • No need to install Node.js on your server(s). Less security, performance and maintenance concerns.
  • Make use of your existing Docker runtime environments, or use those offered by Amazon, Google, Microsoft etc.
  • Benefit from the extremely comprehensive tools ecosystem (monitoring, deployment etc) that is available for Docker.
  • Updating Butler SOS to the latest version (assuming no config file changes are needed for that particular upgrade) is as easy as stopping the container, doing a “docker pull ptarmiganlabs/butler-sos:latest”, and finally starting the container again.

If Docker is not an option, the pre-built, stand-alone binaries for Windows, Linux and macOS are good options.
They offer a download-configure-execute approach to running Butler SOS. Also a very good option.

But even with the above recommendations, Butler SOS can be deployed in lots of different configurations.
It is therefore difficult to give precise instructions that will work everwhere, for everyone. Especially the fact that Butler SOS uses certificates to authenticate with Sense is a complicating factor. Certificates are (when correctly used) great for securing systems, but they can alse cause headaches.

First we must recognize that Sense uses self signed certificates. This is fine, and as long as you work on a server where Sense Enterprise is installed, that server will have the Sense-provided certificates and Certificate Authority (CA) installed.

This means that the easiest option for getting Butler SOS up and running is usually to install it on one of your Sense servers.

That said, it is probably better system design to run Butler SOS (and maybe other members of the Butler family) on their own server, maybe using some flavour of Linux (lower cost compared to Windows). Windows servers work equally well though.

In this case you might want to consider exporting the Sense CA certificate from one of your Sense servers, and then install it on the Linux server. This should technically not be needed for Butler SOS to work correctly - as long as you specify the correct root.pem file in the Butler SOS config file, you should be ok.

If you specify an incorrect root CA certificate file in the clientCertCA config option, you will get an error like this:

2018-05-23T20:36:44.393Z - error: Error: Error: unable to verify the first certificate
    at TLSSocket.<anonymous> (_tls_wrap.js:1105:38)
    at emitNone (events.js:106:13)
    at TLSSocket.emit (events.js:208:7)
    at TLSSocket._finishInit (_tls_wrap.js:639:8)
    at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:469:38)
2018-05-23T20:36:49.164Z - verbose: Event started: Query log db
2018-05-23T20:36:49.180Z - verbose: Event started: Statistics collection

A general note on host names is also relevant.
If you specify a server name of “myserver.company.com” while exporting certificates from the QMC, you should use that same server name in the Butler SOS config file. Failing to do so will (most likely) result in an error:

2018-05-23T19:51:03.087Z - error: Error: Error: Hostname/IP doesn't match certificate's altnames: "Host: serveralias.company.net. is not in the cert's altnames: DNS:myserver.company.com"
    at Object.checkServerIdentity (tls.js:223:17)
    at TLSSocket.<anonymous> (_tls_wrap.js:1111:29)
    at emitNone (events.js:106:13)
    at TLSSocket.emit (events.js:208:7)
    at TLSSocket._finishInit (_tls_wrap.js:639:8)
    at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:469:38)
2018-05-23T19:51:07.701Z - verbose: Event started: Statistics collection

2.2.1 - Choosing a platform - what are the options?

You can run Butler SOS on several platforms, each with their own pros and cons. This section should help you decide which platform is right for you.

As Butler SOS is written in Node.js, the tool in theory runs on all platforms where Node.js is available. It is also available as a Docker image.

Docker is by far the preferred way of running Butler SOS, mainly because it gives you a very nice, production grade (stable, scalable, monitorable etc) execution environment. If you are really serious about scalability and stability you could even run Butler SOS in Kubernetes.

Other platforms can be used too, of course - let’s look at the pros and cons of some of the more commonly used platforms:

Platform Pros Cons
Docker - Easy to set up Butler SOS in Docker
- Easy to test new versions of Butler SOS
- Use existing Docker infrastructure
- Monitoring, restarts etc built into Docker
- Runs on low cost hardware and OSs
- Docker environment needed (if not already available). Setting up and running Docker is not hard, but does require somewhat other skills than those needed to run a Sense environment
Kubernetes - Enterprise grade
- Fault tolerant
- Deployed alongside other enterprise applications
- More difficult to set up than Docker
Windows server - Butler SOS can run on same server as Qlik Sense, saving hardware/server costs
- Pre-built, standalone binaries available
- Running Butler SOS natively on the Sense server is a potential risk (usually a good idea to isolate systems/services to their own servers/environments whenever possible)
- More difficult (compared to Docker) to achieve a production grade setup (auto restarts etc)
Linux - No cost for operating system (at least not for most Linux versions)
- Runs on low cost hardware
- Pre-built, standalone binaries available
- More difficult (compared to Docker) to achieve a production grade setup (auto restarts etc)
Mac OS - For development, if you want to extend or modify Butler SOS
- Signed, pre-built, standalone binaries available
- Not a server grade operating system, i.e. not for production use
Windows (desktop) - For development, if you want to extend or modify Butler SOS - Not a server grade operating system, i.e. not for production use

2.2.2 - Native app

How to install Butler SOS as a Node.js application.

Selecting an OS

While Qlik Sense Enterprise is a Windows only system, Butler SOS should be able to run on any OS where Node.js is available.
Butler SOS has been succesfully used as a native Node.js app - during development and production - on Windows, Linux (Debian and Ubuntu tested) and mac OS.

Prerequisites

What Comment
Qlik Sense Enterprise on Windows Mandatory. Butler SOS is developed with Qlik Sense Enterprise on Windows (QSEoW) in mind.
Butler SOS is simply not intended to work with Sense Desktop or Sense cloud.
Node.js Mandatory. Butler SOS is written in Node - which is thus a firm requirement.
MQTT broker Optional. MQTT is used for outbound pub-sub messaging. Butler SOS assumes a working MQTT broker is available, the IP of which is defined in the Butler SOS config file. Mosquitto is a great open source broker. It requires very little hardware to run, even the smallest (usually free) Amazon/Google/Microsoft/… instance is enough, if you want a dedicated MQTT server. If you don’t care about the pubsub features of Butler SOS, you don’t need a MQTT broker. In this case you can disable the MQTT features in the config YAML file.
InfluxDB Use at least one of InfluxDB and Prometheus. An open source database for realtime information, used to store metrics around Butler’s own memory usage over time (if this feature is enabled).
At this point more metrics and events are sent to InfluxDB, compared to Prometheus.
Prometheus Use at least one of InfluxDB and Prometheus. The de-facto standard open source tool for metrics gathering in large-scale systems, including Kubernetes. A bit more complex to set up and configure compared to InfluxDB, but also more focused on providing observability features.
Grafana Optional. The de-facto open source standard for showing real-time metrics. In order to visualise Sense realtime metrics in Grafana you must enable at least one of InfluxDB or Prometheus.

2.2.2.1 - Windows

Running Butler SOS in Windows.

Installation

There are two options: Run Butler SOS as a standalone binary or as a Node.js app. The first is by far easier to set up and maintain and thus recommended.

Using the pre-built, standalone app

The pre-build binaries are available from the releases page.

  1. Download the Windows binary
  2. “Unblock” the downloaded zip file
    1. Right-click the zip file
    2. Select “properties”
    3. Mark the “Unblock” check box in the lower right part of the properties window
    4. Click the “Apply” button, then “Ok” to close the properties window
  3. Unzip the zip file
  4. Move the extracted butler-sos.exe file to desired location, for example d:\tools\butler-sos
  5. Use nssm or similar tool to install Butler SOS as a Windows service

Unblocking the Butler SOS zip file on Windows Server

Using Node.js

In this scenario you will use the Butler SOS source code together with the standard Node.js runtime libraries.

The result is the same as with the stand-alone binaries, you just have to do more of the work yourself.
This is usually not preferred, but if you want to add new features to (or modify existone ones) Butler SOS, this option is for you

1. Install Node.js

The latest LTS version is usually a good choice.

2. Select a directory from which Butler SOS will be run

This can be pretty much anywhere, in this example d:\tools\butler-sos will be used.

3. Get Butler SOS

Get the desired Butler SOS version and extract it into the directory above.

Get the latest available version unless you have a really good reason to use an older version.
New features are added, bugs fixed and security updates are applied in each version - it’s simply a good idea to use the latest version.

Do not just clone the Butler SOS repository as that will give you the latest development version, which may not yet be fully tested and packaged.
The exception is of course if you want to contribute to Butler SOS development - then forking and cloning the repository is the right thing to do.

4. Install Node.js dependencies

From d:\tools\butler-sos\src, run npm i to install the various Node.js modules used by Butler SOS. Depending on your server configuration you may get some warnings about (for example) Python not being installed, these can usually be ignored.

Configuration

The configuration file is used the same way as when Butler SOS runs on Docker, with one exception:

The path to the certificates used to authenticate with Sense must be specified in the config file. With Docker the certificate path is always the same, but with Windows you need to specify where the certificate files are located.

For example, if the certificate files exported from Sense are stored in d:\secrets\sensecert, the config file would look like this when used on Windows:

  ...
  # Certificates to use when querying Sense for healthcheck data. Get these from the Certificate Export in QMC.
  cert:
    clientCert: d:\secrets\sensecert\client.pem
    clientCertKey: d:\secrets\sensecert\client_key.pem
    clientCertCA: d:\secrets\sensecert\root.pem

Stayin’ alive

A tool like Butler SOS should of course start automatically when the server it runs on is restarted. This can be achieved in at least a couple of ways:

  1. By far the best option is to turn Butler SOS into a Windows service. That way it will be started on server boot, restarted if it fails etc. There are various tools for doing this, with NSSM being a very, very good one. Butler SOS has been installed in lots of Sense clusters this way.

  2. You can also use a Node process monitor such as PM2 to monitor the Butler SOS process, and restart it if it for some reason crashes. PM2 is not entirely easy to use on Windows though.

2.2.2.2 - Linux and Mac OS

Running Butler SOS in Linux and Mac OS. Installation and configuration.

Installation

There are two options: Run Butler SOS as a standalone binary or as a Node.js app. The first is by far easier to set up and maintain and thus recommended.

Using the pre-built, standalone app

The pre-build binaries are available from the releases page.

  1. Download the Linux/macOS binary
  2. Move the extracted butler-sos file to desired location.
  3. Use the process monitor of choice (see below) to make sure Butler SOS is always running

Using Node.js

This scenario is identical to the Windows scenario, please refer to that page for details. Keep in mind that the format of file systems paths differ between Windows and Linxu/Mac OS.

Configuration

Once again, same thing as on Windows.

Stayin’ alive

A Node process monitor can be used on Linux or Mac OS too.
Tools like PM2 in fact usually work better on Linux/Mac OS than on Windows.

You can probably also use Linux’ standard service layer to start Butler SOS, that has not been tested though.

2.2.3 - Docker

Running Butler SOS in Docker. Installation and configuration.

Tip

Butler SOS Docker images are automatically built for several architectures:

  • amd64: This is by far the most common platform - your typical Intel based server use amd64.
  • arm64: Arm servers are now available from most cloud providers and offer very competetive price/performance. Apple’s new M1 CPUs also use arm64, as does the newer Raspberry Pi models.
  • arm/v7: An older Arm architecture found in previous-gen Raspberry Pis, for example.

All images are available on Docker Hub.

Docker is great in that it runs on many different platforms.
This means that as long as the Docker runtime environment is installed, you can run Butler SOS on your Mac laptop, on a Linux server or on a Windows server.
Or in a Kubernetes cluster to get enterprise grade process monitoring of Butler SOS.

Installation

Docker runtime

Installing Docker is beyond the scope of this document, but there are plenty of online tutorials covering this.

Butler SOS installation and configuration

When using Docker there is no installation in the traditional sense.
Instead we (in this case) use a docker-compose file to define how Butler SOS should be executed within a Docker container. There are also other ways to start Docker containers, but docker-compose is usually a good and robust starting point.

Configuration of Butler specific settings is then done using Butler’s own JSON/YAML config file.

Install & configure - walkthrough

Create a directory for Butler SOS. Config files and logs will be stored here.
This example uses macOS but the commands will be very similar on Linux.
Docker on Windows is another story - it’s there and works great, but not always identical to Linux/macOS.

➜  butler-sos-demo mkdir -p butler-sos-docker/config/certificate
➜  butler-sos-demo mkdir -p butler-sos-docker/log
➜  butler-sos-demo cd butler-sos-docker
➜  butler-sos-docker
  1. Copy docker-compose.yml from the GitHub repository to the Butler SOS directory that was created above. The directory where the docker-compose file is stored is the ‘root’ directory of Butler SOS - everything else is relative to this directory.

  2. Adapt the docker-compose file as needed (usually no changes are needed if you want to run the latest version of Butler SOS).

  3. Copy the YAML config file from the GitHub repository into the ./config directory, rename it to production.yaml (or something else, as long as it matches the NODE_ENV environment variable set in the docker-compose.yml file) and edit it as needed. Note that for the Docker setup the path to certificates (as specified in the YAML config file) should be /nodeapp/config/certificate/ (this is the Docker container’s internal path to the certificate directory).

  4. Edit the config file above as needed, with respect to your local Sense environment, folder paths etc. The provided template file has reasonable defualt settings where possible, but there are also a number of paths, passwords etc that must be configured.

  5. Export certifiates from the QMC in Qlik Sense Enterprise, place them in the ./config/certificate directory (i.e. in a subdirectory to the directory where the docker-comspose file is stored). The certificates can in theory be placed anywhere, as long as they are made available to the Docker container via a volume mount in the docker-compose.yaml file (e.g. - "./config:/nodeapp/config").

Let’s do this one step at a time.
Here we will bring up a single container with Butler SOS in it.
The Butler SOS config file is called production.yaml.

First, what files are there?

➜  butler-sos-docker ls -la
total 8
drwxr-xr-x  5 goran  staff   160 Aug 21 19:08 .
drwxr-xr-x  3 goran  staff    96 Aug 21 18:49 ..
drwxr-xr-x  4 goran  staff   128 Aug 21 19:08 config
-rw-r--r--  1 goran  staff  1505 Aug 21 19:01 docker-compose.yml
drwxr-xr-x  2 goran  staff    64 Aug 21 18:49 log
➜  butler-sos-docker
➜  butler-sos-docker ls -la config
total 48
drwxr-xr-x  4 goran  staff    128 Aug 21 19:08 .
drwxr-xr-x  5 goran  staff    160 Aug 21 19:08 ..
drwxr-xr-x  5 goran  staff    160 Aug 21 19:08 certificate
-rw-r--r--  1 goran  staff  21903 Aug 21 19:08 production.yaml
➜  butler-sos-docker
➜  butler-sos-docker ls -la config/certificate
total 24
drwxr-xr-x  5 goran  staff   160 Aug 21 19:08 .
drwxr-xr-x  4 goran  staff   128 Aug 21 19:08 ..
-rw-r--r--@ 1 goran  staff  1170 Aug 21 19:06 client.pem
-rw-r--r--@ 1 goran  staff  1704 Aug 21 19:06 client_key.pem
-rw-r--r--@ 1 goran  staff  1224 Aug 21 19:06 root.pem
➜  butler-sos-docker

What does the docker-compose.yml file look like?

➜  butler-sos-docker cat docker-compose.yml
# docker-compose.yml
version: "3.3"
services:
    butler-sos:
        image: ptarmiganlabs/butler-sos:latest
        container_name: butler-sos
        restart: always
        volumes:
            # Make config file and log files accessible outside of container
            - "./config:/nodeapp/config"
            - "./log:/nodeapp/log"
        environment:
            - "NODE_ENV=production" # Means that Butler SOS will read config data from production.yaml
        logging:
            driver: "json-file"
            options:
                max-file: "5"
                max-size: "5m"
        networks:
            - senseops

networks:
    senseops:
        driver: bridge

➜  butler-sos-docker

Ok, all good. Let’s start the container using docker-compose (the exact output will depend on what version of Butler SOS you are using and what features you have enabled in its YAML config file).

➜  butler-sos-docker docker-compose up
Creating network "butler-sos-docker_senseops" with driver "bridge"
Creating butler-sos ... done
Attaching to butler-sos
butler-sos    | 2022-08-23T03:45:28.754Z info: CONFIG: Influxdb enabled: true
butler-sos    | 2022-08-23T03:45:28.757Z info: CONFIG: Influxdb host IP: 192.168.100.20
butler-sos    | 2022-08-23T03:45:28.757Z info: CONFIG: Influxdb host port: 8086
butler-sos    | 2022-08-23T03:45:28.758Z info: CONFIG: Influxdb db name: senseops
butler-sos    | 2022-08-23T03:45:29.003Z info: CONFIG: Found InfluxDB database: senseops
butler-sos    | 2022-08-23T03:45:29.219Z info: --------------------------------------
butler-sos    | 2022-08-23T03:45:29.220Z info: Starting Butler SOS
butler-sos    | 2022-08-23T03:45:29.220Z info: Log level: verbose
butler-sos    | 2022-08-23T03:45:29.221Z info: App version: 9.2.0
butler-sos    | 2022-08-23T03:45:29.221Z info: Instance ID    : 87b978019ae........
butler-sos    | 2022-08-23T03:45:29.222Z info:
butler-sos    | 2022-08-23T03:45:29.223Z info: Node version   : v18.7.0
butler-sos    | 2022-08-23T03:45:29.223Z info: Architecture   : x64
butler-sos    | 2022-08-23T03:45:29.224Z info: Platform       : linux
butler-sos    | 2022-08-23T03:45:29.224Z info: Release        : 11
butler-sos    | 2022-08-23T03:45:29.224Z info: Distro         : Debian GNU/Linux
butler-sos    | 2022-08-23T03:45:29.224Z info: Codename       : bullseye
butler-sos    | 2022-08-23T03:45:29.224Z info: Virtual        : false
butler-sos    | 2022-08-23T03:45:29.225Z info: Processors     : 4
butler-sos    | 2022-08-23T03:45:29.225Z info: Physical cores : 4
butler-sos    | 2022-08-23T03:45:29.225Z info: Cores          : 4
butler-sos    | 2022-08-23T03:45:29.226Z info: Docker arch.   : undefined
butler-sos    | 2022-08-23T03:45:29.226Z info: Total memory   : 6233055232
butler-sos    | 2022-08-23T03:45:29.226Z info: Standalone app : false
butler-sos    | 2022-08-23T03:45:29.226Z info: --------------------------------------
butler-sos    | 2022-08-23T03:45:29.226Z info: Client cert: /nodeapp/config/certificate/client.pem
butler-sos    | 2022-08-23T03:45:29.227Z info: Client cert key: /nodeapp/config/certificate/client_key.pem
butler-sos    | 2022-08-23T03:45:29.227Z info: CA cert: /nodeapp/config/certificate/root.pem
butler-sos    | 2022-08-23T03:45:29.250Z verbose: MAIN: Anonymous telemetry reporting has been set up.
butler-sos    | 2022-08-23T03:45:29.252Z verbose: MAIN: Starting Docker healthcheck server...
butler-sos    | 2022-08-23T03:45:29.257Z info: USER EVENT: UDP server listening on 0.0.0.0:9997
butler-sos    | 2022-08-23T03:45:29.257Z info: LOG EVENT: UDP server listening on 0.0.0.0:9996
butler-sos    | 2022-08-23T03:45:29.290Z info: MAIN: Started Docker healthcheck server on port 12398.
butler-sos    | 2022-08-23T03:45:29.290Z info: MAIN: Starting Prometheus Butler SOS endpoint on 0.0.0.0:9842.
butler-sos    | 2022-08-23T03:45:29.291Z verbose: PROM: Setting up Prometheus client for server: sense1
butler-sos    | 2022-08-23T03:45:29.292Z verbose: PROM: Setting up Prometheus client for server: sense2
butler-sos    | 2022-08-23T03:45:29.310Z info: PROM: Prometheus Butler SOS metrics server now listening on port 9842
butler-sos    | 2022-08-23T03:45:29.311Z info: PROM: Prometheus Node.js metrics server now listening on port 0.0.0.0:9001
butler-sos    | 2022-08-23T03:45:30.911Z verbose: --------------------------------
butler-sos    | 2022-08-23T03:45:30.911Z verbose: Iteration # 1, Uptime: 0 months, 0 days, 0 hours, 0 minutes, 2.005 seconds, Heap used 33.26 MB of total heap 58.39 MB. External (off-heap): 3.57 MB. Memory allocated to process: 92.45 MB.
butler-sos    | 2022-08-23T03:45:31.051Z verbose: UPTIME NEW RELIC: Sent Butler SOS memory usage data to New Relic account 123456789 ("Ptarmigan Labs NR account")
butler-sos    | 2022-08-23T03:45:31.269Z verbose: MEMORY USAGE INFLUXDB: Sent Butler SOS memory usage data to InfluxDB
...

Once everything everything looks good you can stop the containers (ctrl-C), then start them again in daemon mode (i.e. running unattended in the background):

➜  butler-sos-docker docker-compose up -d
Starting butler-sos ... done
➜  butler-sos-docker

Setting the log level to info in the config file will reduce log output.

The Docker container implements Docker healthchecks, which means you can run docker ps to see whether the container is healthy or not (assuming Docker healthchecks are enabled in the config file, of course):

➜  butler-sos-docker docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED              STATUS                    PORTS     NAMES
9d2253511a24   ptarmiganlabs/butler-sos:latest   "docker-entrypoint.s…"   About a minute ago   Up 17 seconds (healthy)             butler-sos
➜  butler-sos-docker

2.2.4 - InfluxDB & Grafana

How to use Butler SOS with InfluxDB and Grafana using Docker.

Warning

Butler SOS was developed with InfluxDB version 1.x in mind.

InfluxDB is currently available in version 2.x and while this version brings lots of new goodies, it’s not out-of-the-box compatible with Butler SOS.
For that reason you should use the latest 1.x version of InfluxDB, which at the time of this writing is 1.8.4.

In due time Butler SOS will be updated to support InfluxDB 2.x too.

If you already have InfluxDB and/or Grafana running you can skip this section.

Running in Docker using docker-compose

The easiest way to get started is to run these tools in Docker containers, controlled by docker-compose files.
Running them under Kubernetes will give you a whole other level of fault tolerance, scalability etc - but this also requries much more when it comes to Kubernetes skills. Use the setup that’s relevant to your use case.

You can use a single docker-compose file for Butler SOS, InfluxDB and Grafana - or separate docker-compose files for each tool.

The advantage of using a single docker-compose file is that the entire stack of tools will be launched in unison. You can create dependencies between the tools if needed etc - very convenient. On the other hand, having separate docker-compose files makes it easier to restart (or upgrade or in other ways change) a single service without affecting other services.

Full stack docker-compose file

Let’s start Butler SOS, InfluxDB and Grafana from a single docker-compose_fullstack_influxdb.yml file:

➜  butler-sos-docker cat docker-compose_fullstack_influxdb.yml
# docker-compose_fullstack_influxdb.yml
version: "3.3"
services:
    butler-sos:
        image: ptarmiganlabs/butler-sos:latest
        container_name: butler-sos
        restart: always
        volumes:
            # Make config file and log files accessible outside of container
            - "./config:/nodeapp/config"
            - "./log:/nodeapp/log"
        environment:
            - "NODE_ENV=production_influxdb" # Means that Butler SOS will read config data from production_influxdb.yaml
        logging:
            driver: "json-file"
            options:
                max-file: "5"
                max-size: "5m"
        networks:
            - senseops

    influxdb:
        image: influxdb:1.8.10
        container_name: influxdb
        restart: always
        volumes:
            - ./influxdb/data:/var/lib/influxdb # Mount for influxdb data directory
            - ./influxdb/config/:/etc/influxdb/ # Mount for influxdb configuration
        ports:
            # The API for InfluxDB is served on port 8086
            - "8086:8086"
            - "8082:8082"
        environment:
            # Disable usage reporting
            - "INFLUXDB_REPORTING_DISABLED=true"
        networks:
            - senseops

    grafana:
        image: grafana/grafana:latest
        container_name: grafana
        restart: always
        ports:
            - "3000:3000"
        volumes:
            - ./grafana/data:/var/lib/grafana
        networks:
            - senseops

networks:
    senseops:
        driver: bridge

➜  butler-sos-docker

Assuming you’ve already completed the setup of Butler SOS, the result of running the docker-compose_fullstack_influxdb.yml file above is something like this:

➜  butler-sos-docker docker-compose -f docker-compose_fullstack_influxdb.yml up
Creating network "butler-sos-docker_senseops" with driver "bridge"
Creating influxdb   ... done
Creating butler-sos ... done
Creating grafana    ... done
Attaching to butler-sos, grafana, influxdb
...
...
grafana       | logger=grafanaStorageLogger t=2022-08-21T18:13:42.76538465Z level=info msg="storage starting"
grafana       | logger=ngalert t=2022-08-21T18:13:42.780004463Z level=info msg="warming cache for startup"
grafana       | logger=http.server t=2022-08-21T18:13:42.796364325Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=
grafana       | logger=ngalert.multiorg.alertmanager t=2022-08-21T18:13:42.807894344Z level=info msg="starting MultiOrg Alertmanager"
butler-sos    | 2022-08-21T18:13:42.908Z info: CONFIG: Influxdb enabled: true
butler-sos    | 2022-08-21T18:13:42.911Z info: CONFIG: Influxdb host IP: influxdb
butler-sos    | 2022-08-21T18:13:42.912Z info: CONFIG: Influxdb host port: 8086
butler-sos    | 2022-08-21T18:13:42.912Z info: CONFIG: Influxdb db name: senseops
influxdb      | ts=2022-08-21T18:13:43.139047Z lvl=info msg="Executing query" log_id=0cSPbmJG000 service=query query="SHOW DATABASES"
influxdb      | [httpd] 172.24.0.2 - - [21/Aug/2022:18:13:43 +0000] "GET /query?p=&q=show+databases&u= HTTP/1.1" 200 84 "-" "-" fd854ac5-217c-11ed-8001-0242ac180003 1084
influxdb      | ts=2022-08-21T18:13:43.169398Z lvl=info msg="Executing query" log_id=0cSPbmJG000 service=query query="CREATE DATABASE senseops"
influxdb      | [httpd] 172.24.0.2 - - [21/Aug/2022:18:13:43 +0000] "POST /query?p=&q=create+database+%22senseops%22&u= HTTP/1.1 " 200 33 "-" "-" fd89e529-217c-11ed-8002-0242ac180003 2940
butler-sos    | 2022-08-21T18:13:43.177Z info: CONFIG: Created new InfluxDB database: senseops
influxdb      | ts=2022-08-21T18:13:43.219945Z lvl=info msg="Executing query" log_id=0cSPbmJG000 service=query query="CREATE RETENTION POLICY \"10d\" ON senseops DURATION 10d REPLICATION 1 DEFAULT"
influxdb      | [httpd] 172.24.0.2 - - [21/Aug/2022:18:13:43 +0000] "POST /query?p=&q=create+retention+policy+%2210d%22+on+%22senseops%22+duration+10d+replication+1+default&u= HTTP/1.1 " 200 33 "-" "-" fd91ac84-217c-11ed-8003-0242ac180003 2299
butler-sos    | 2022-08-21T18:13:43.242Z info: CONFIG: Created new InfluxDB retention policy: 10d
butler-sos    | 2022-08-21T18:13:43.391Z info: --------------------------------------
butler-sos    | 2022-08-21T18:13:43.391Z info: Starting Butler SOS
butler-sos    | 2022-08-21T18:13:43.392Z info: Log level: verbose
butler-sos    | 2022-08-21T18:13:43.393Z info: App version: 9.2.0
butler-sos    | 2022-08-21T18:13:43.394Z info: Instance ID    : 964cbd0a36bc....
butler-sos    | 2022-08-21T18:13:43.394Z info:
butler-sos    | 2022-08-21T18:13:43.395Z info: Node version   : v18.7.0
butler-sos    | 2022-08-21T18:13:43.396Z info: Architecture   : x64
butler-sos    | 2022-08-21T18:13:43.396Z info: Platform       : linux
butler-sos    | 2022-08-21T18:13:43.396Z info: Release        : 11
butler-sos    | 2022-08-21T18:13:43.397Z info: Distro         : Debian GNU/Linux
butler-sos    | 2022-08-21T18:13:43.397Z info: Codename       : bullseye
butler-sos    | 2022-08-21T18:13:43.398Z info: Virtual        : false
butler-sos    | 2022-08-21T18:13:43.398Z info: Processors     : 4
butler-sos    | 2022-08-21T18:13:43.399Z info: Physical cores : 4
butler-sos    | 2022-08-21T18:13:43.399Z info: Cores          : 4
butler-sos    | 2022-08-21T18:13:43.400Z info: Docker arch.   : undefined
butler-sos    | 2022-08-21T18:13:43.400Z info: Total memory   : 6233055232
butler-sos    | 2022-08-21T18:13:43.401Z info: Standalone app : false
butler-sos    | 2022-08-21T18:13:43.401Z info: --------------------------------------
butler-sos    | 2022-08-21T18:13:43.402Z info: Client cert: /nodeapp/config/certificate/client.pem
butler-sos    | 2022-08-21T18:13:43.402Z info: Client cert key: /nodeapp/config/certificate/client_key.pem
butler-sos    | 2022-08-21T18:13:43.402Z info: CA cert: /nodeapp/config/certificate/root.pem
butler-sos    | 2022-08-21T18:13:43.421Z verbose: MAIN: Anonymous telemetry reporting has been set up.
butler-sos    | 2022-08-21T18:13:43.423Z verbose: MAIN: Starting Docker healthcheck server...
butler-sos    | 2022-08-21T18:13:43.428Z info: USER EVENT: UDP server listening on 0.0.0.0:9997
butler-sos    | 2022-08-21T18:13:43.429Z info: LOG EVENT: UDP server listening on 0.0.0.0:9996
butler-sos    | 2022-08-21T18:13:43.461Z info: MAIN: Started Docker healthcheck server on port 12398.
butler-sos    | 2022-08-21T18:13:43.462Z info: MAIN: Starting Prometheus Butler SOS endpoint on 0.0.0.0:9842.
butler-sos    | 2022-08-21T18:13:43.464Z verbose: PROM: Setting up Prometheus client for server: sense1
butler-sos    | 2022-08-21T18:13:43.465Z verbose: PROM: Setting up Prometheus client for server: sense2
butler-sos    | 2022-08-21T18:13:43.482Z info: PROM: Prometheus Butler SOS metrics server now listening on port 9842
butler-sos    | 2022-08-21T18:13:43.483Z info: PROM: Prometheus Node.js metrics server now listening on port 0.0.0.0:9001
butler-sos    | 2022-08-21T18:13:45.080Z verbose: --------------------------------
butler-sos    | 2022-08-21T18:13:45.081Z verbose: Iteration # 1, Uptime: 0 months, 0 days, 0 hours, 0 minutes, 2.007 seconds, Heap used 31.56 MB of total heap 60.81 MB. External (off-heap): 2.98 MB. Memory allocated to process: 102.28 MB.
influxdb      | [httpd] 172.24.0.2 - - [21/Aug/2022:18:13:45 +0000] "POST /write?db=senseops&p=&precision=n&rp=&u= HTTP/1.1 " 204 0 "-" "-" feaf181f-217c-11ed-8004-0242ac180003 44267
butler-sos    | 2022-08-21T18:13:45.137Z verbose: MEMORY USAGE INFLUXDB: Sent Butler SOS memory usage data to InfluxDB
butler-sos    | 2022-08-21T18:13:45.198Z verbose: UPTIME NEW RELIC: Sent Butler SOS memory usage data to New Relic account 123456789 ("Ptarmigan Labs NR account")
...
...

From a separate shell we can then ensure that the expected Docker containers are running:

➜  ~ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED              STATUS                        PORTS                                            NAMES
2311d17d1285   ptarmiganlabs/butler-sos:latest   "docker-entrypoint.s…"   About a minute ago   Up About a minute (healthy)                                                    butler-sos
a22307d12263   influxdb:1.8.10                   "/entrypoint.sh infl…"   About a minute ago   Up About a minute             0.0.0.0:8082->8082/tcp, 0.0.0.0:8086->8086/tcp   influxdb
81df665545d0   grafana/grafana:latest            "/run.sh"                About a minute ago   Up About a minute             0.0.0.0:3000->3000/tcp                           grafana
➜  ~

That’s great, we now have a single command (docker-compose -f docker-compose_fullstack_influxdb.yml up -d for background/daemon mode) to bring up all the tools needed to monitor a Qlik Sense cluster!

Now, let’s see if any data has arrived in InfluxDB.
Let’s check this by going into Grafana, which is available on port 3000.

First time logging into a new Grafana instance you can use the default admin acount (username=admin, password=admin).
You will be asked to change that password during first login.

First add a data source in Grafana, pointing it to the local InfluxDB server.

Adding an InfluxDB datasource in Grafana

Now we can create a basic chart in Grafana, showing for example Butler SOS’ own memory usage.
After a while we should see some data in the chart:

Butler SOS' own memory usage, stored in InfluxDB and visualised in Grafana

Need to stop the entire stack of tools?
Easy - just run docker-compose -f docker-compose_fullstack_influxdb.yml down:

➜  butler-sos-docker docker-compose -f docker-compose_fullstack_influxdb.yml down
Stopping butler-sos ... done
Stopping influxdb   ... done
Stopping grafana    ... done
Removing butler-sos ... done
Removing influxdb   ... done
Removing grafana    ... done
Removing network butler-sos-docker_senseops
➜  butler-sos-docker

2.2.5 - Prometheus & Grafana

How to use Butler SOS with Prometheus and Grafana using Docker.

Warning

Work in progress

While Butler SOS’ Prometheus support is functional and works well, this documentation page is not yet complete.

Info

Prometheus is extremely powerful and flexible.
In fact, it’s probably the closest thing there is to a de facto standard for monitoring large scale software systems today.
No matter if you run Kubernetes cluster spanning multiple data centers and continents, or just a single Butler SOS instance - Prometheus is an excellent choice for monitoring of operational metrics.

That power and flexibility also means it can be challenging to set up Prometheus.
Usually it’s not that difficult, but if you’re new to Docker and has no previous experience with monitoring tools, using InfluxDB is usually a bit easier.

Or view it as a chance to learn more about one of the absolute stars of open source software - Prometheus is awesome!

This page assumes you don’t already have Prometheus and Grafana running.
If you already have access to those tools you can of course instead configure them to work with Butler SOS.

Running in Docker using docker-compose

The easiest way to get started is to run these tools in Docker containers, controlled by docker-compose files.
Running them under Kubernetes will give you a whole other level of fault tolerance, scalability etc - but this also requries much more when it comes to Kubernetes skills. Use the setup that’s relevant to your use case.

You can use a single docker-compose file for Butler SOS, Prometheus and Grafana - or separate docker-compose files for each tool.

The advantage of using a single docker-compose file is that the entire stack of tools will be launched in unison. You can create dependencies between the tools if needed etc - very convenient. On the other hand, having separate docker-compose files makes it easier to restart (or upgrade or in other ways change) a single service without affecting other services.

Full stack docker-compose file

Let’s start Butler SOS, Prometheus and Grafana from a single docker-compose_fullstack_prometheus.yml file:

# docker-compose_fullstack_prometheus.yml
version: "3.3"
services:
    butler-sos:
        image: ptarmiganlabs/butler-sos:latest
        container_name: butler-sos
        restart: always
        volumes:
            # Make config file and log files accessible outside of container
            - "./config:/nodeapp/config"
            - "./log:/nodeapp/log"
        environment:
            - "NODE_ENV=production_prometheus" # Means that Butler SOS will read config data from production_prometheus.yaml
        logging:
            driver: "json-file"
            options:
                max-file: "5"
                max-size: "5m"
        networks:
            - senseops

    prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        volumes:
            - ./prometheus/:/etc/prometheus/
            - prometheus_data:/prometheus
        command:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--web.console.libraries=/usr/share/prometheus/console_libraries"
            - "--web.console.templates=/usr/share/prometheus/consoles"
            # - "--log.level=debug"
        ports:
            - 9090:9090
        links:
            - alertmanager:alertmanager
        networks:
            - senseops
        restart: always

    alertmanager:
        image: prom/alertmanager
        container_name: alertmanager
        ports:
            - 9093:9093
        volumes:
            - ./alertmanager/:/etc/alertmanager/
        networks:
            - senseops
        restart: always
        command:
            - "--config.file=/etc/alertmanager/config.yml"
            - "--storage.path=/alertmanager"

    grafana:
        image: grafana/grafana:latest
        container_name: grafana
        restart: always
        ports:
            - "3000:3000"
        volumes:
            - ./grafana/data:/var/lib/grafana
        networks:
            - senseops

networks:
    senseops:
        driver: bridge

Assuming you’ve already completed the setup of Butler SOS, the result of running the docker-compose_fullstack_prometheus.yml file above is something like this:


...
...

From a separate shell we can then ensure that the expected Docker containers are running:

➜ docker ps
CONTAINER ID   IMAGE                             COMMAND                  CREATED         STATUS                            PORTS                                                                                  NAMES

That’s great, you now have a single command (docker-compose -f docker-compose_fullstack_influxdb.yml up -d for background/daemon mode) to bring up all the tools needed to monitor a Qlik Sense cluster!

Need to stop the entire stack of tools?
Easy - just run docker-compose -f docker-compose_fullstack_influxdb.yml down:

➜ docker-compose -f docker-compose_fullstack_influxdb.yml down

2.3 - Setup

Everything you wanted to know about Butler SOS configuration but never dared to ask.

2.3.1 - Which config file to use

Butler SOS can use multiple config files. Here you learn to control which one is used by Butler SOS.

A description of the config file format is available here.

Select which config file to use

Butler SOS uses configuration files in YAML format. The config files are stored in the src/config folder.

Butler SOS comes with a default config file called production_template.yaml.
Make a copy of it, then rename the copy to default.yaml, production.yaml, staging.yaml or something else suitable to your specific use case.

Update the config file as needed (see the config file reference page for details).

Trying to run Butler SOS with the default config file (the one included in the files download from GitHub) will not work - you must adapt it to your server environment. For example, you need to enter the IP or host name of you Sense server(s), the IP or host name where Butler SOS is running, where the Sense certificates are stored etc.

The name of the config file matters. Unless you specifically name which config to use when starting Butler SOS, it will look for an environment variable called “NODE_ENV” and then try to load a config file named with the value found in NODE_ENV.

Example 1:

  • Environment variable NODE_ENV = production
  • Butler SOS is started without specifying a config file: butler.exe --loglevel info
  • Butler SOS will look for a config file config/production.yaml.

Example 2:

  • Butler SOS is started with a command line option specifying a config file: butler.exe --configfile d:\some\path\butler-sos-config-prod.yaml --loglevel info
  • Butler SOS will not look at the NODE_ENV environment variable. Settings will be loaded from the butler-sos-config-prod.yaml instead.

Running several Butler SOS instances in parallel

If you have several Sense clusters (for example DEV, TEST and PROD environments) you can either monitor them all from a single Butler SOS instance, or set up separate instances for each Sense cluster.
The second case is implemented by creating several config files: butler_dev.yaml, butler_test.yaml and butler_prod.yaml.

In this scenario three instances of Butler SOS should be started, each given a different config file by setting the NODE_ENV variable as needed when starting Butler SOS.
Or (this option is usually much easier!) use the --configfile command line option when starting Butler SOS.

Note: If running several Butler SOS instances in parallel, you must also ensure that each one uses unique port numbers for their respective UDP servers etc.

Setting environment variables

The method for setting environment variables varies between operating systems:

On Windows:

set NODE_ENV=production

Mac OS or Linux

export NODE_ENV=production

If using Docker, the NODE_ENV environment varible is set in the docker-compose.yml file (as already done in the template docker-compose files.)

2.3.2 - Config file verification

How to verify that the Butler SOS config file is valid.

A description of the config file format is available here.

Configuring the Butler SOS config file is usually the most challenging part of setting up Butler SOS.
The config file is written in an easy to read YAML format, but given the number of settings that can be configured, it can be a bit daunting to get it right.

Starting with version 9.6.0, Butler SOS comes with a built-in config file verification tool that can be used to verify that the config file is valid.

Verify the config file

Config file verification is enabled by default.

Verification is done when Butler SOS is started, and if the config file is not valid, Butler SOS will not start.

Info

All settings in the config file are mandatory.
If you don’t want to use a specific Butler SOS feature, you must still include its settings in the config file, but you are free to disable the feature and set its setting to empty strings/values/arrays.

Skipping config file verification

If you want to skip config file verification, you can do so by starting Butler SOS with the ‘–skip-config-verificationsetting totrue` in the config file.

This will bypass all checks of the config file’s validity, and Butler SOS will try to start with the provided config file.
This can be useful if you are in the process of setting up Butler SOS and want to start it before the config file is complete, but for production scenarios it is recommended to leave config file verification enabled.

2.3.3 - Configuring Butler SOS logging

Heartbeats provide a way to monitor that Butler SOS is running and working as intended.
Butler SOS can send periodic heartbeat messages to a monitoring tool, which can then alert if Butler SOS hasn’t checked in as expected.

What’s this?

Butler SOS continuously logs what its doing.

The top level section Butler-SOS in the config file has a set of settings that control logging and telemetry.

Log level (verbosity) can be set, logging to disk can be enabled/disabled etc.

For more information about telemetry, please see this page.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Logging configuration
  logLevel: info          # Log level. Possible log levels are silly, debug, verbose, info, warn, error
  fileLogging: true       # true/false to enable/disable logging to disk file
  logDirectory: log       # Subdirectory where log files are stored
  ...
  ...

2.3.4 - Configuring Butler SOS heartbeats

Heartbeats provide a way to monitor that Butler SOS is running and working as intended.
Butler SOS can send periodic heartbeat messages to a monitoring tool, which can then alert if Butler SOS hasn’t checked in as expected.

What’s this?

A tool like Butler SOS should be viewed as mission critical, at least if it’s features are used by mission critical Sense apps.

But how can you know whether Butler SOS itself is working?
Somehow Butler SOS itself should be monitored.

Butler SOS (and most other tools in the Butler family) has a heartbeat feature.
It sends periodic messages to a monitoring tool, which can then alert if Butler SOS hasn’t checked in as expected.

Healthchecks.io is an example of such as tool. It’s open source but also has a SaaS option if so preferred.

More info on using Healthchecks.io with Butler (Butler SOS works the same way) can be found in this blog post.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Heartbeats can be used to send "I'm alive" messages to some other tool, e.g. an infrastructure monitoring tool
  # The concept is simple: The remoteURL will be called at the specified frequency. The receiving tool will then know 
  # that Butler SOS is alive.
  heartbeat:
    enable: false
    remoteURL: http://my.monitoring.server/some/path/
    frequency: every 1 hour         # https://bunkat.github.io/later/parsers.html#text
  ...
  ...

2.3.5 - Docker healthcheck

Docker has a concept of “health checks”, which is a way for Docker containers to tell the Docker runtime engine that the container is alive and well. Butler SOS can be configured to send such health check messages to Docker.

Note: Sending health check messages is only meaningful when running Butler SOS as a Docker container.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Docker health checks are used when running Butler SOS as a Docker container. 
  # The Docker engine will call the container's health check REST endpoint with a set interval to determine
  # whether the container is alive/well or not.
  # If you are not running Butler SOS in Docker you can safely disable this feature. 
  dockerHealthCheck:
    enable: true                    # Control whether a REST endpoint will be set up to serve Docker health check messages
    port: 12398                     # Port the Docker health check service runs on (if enabled)
  ...
  ...

2.3.6 - Configuring Butler SOS uptime monitor

Butler SOS can optionally log how long it’s been running and how much memory it uses.
Optionally the memory usage can also be stored to an InfluxDB database or sent to New Relic, for later viewing/alerting in for example a Grafana dashboard or within New Relic.

What’s this?

In some cases - especially when investigating issues or bugs - it can be useful to get log messages telling how long Butler SOS has been running and how much memory it uses.

This feature is called “uptime monitoring” and can be enabled in the main config file. The feature is being added to more and more tools in the Butler family of tools for Qlik Sense.

The logging interval is configurable, as is the log level required for uptime messages to be shown in the console/file log.

Select a reasonable retention policy and logging frequency!
You will rarely if ever need to know how much memory Butler SOS used a month ago… A retention policy of 1-2 weeks is usually a good start, logging uptime metrics every few minutes.

InfluxDB

The memory usage data can optionally be written to InfluxDB, from where it can later be viewed in Grafana.
The metrics will be stored in the database specified in ``.
The log level does not affect storing uptime metrics in InfluxDB or New Relic.

New Relic

Uptime metrics can be sent to zero or more New Relic accounts.

New Relic attributes (a concept where each data point sent to New Relic is tagged with a set of attributes) can be added to the metrics.
Attributes come in two forms: Static and dynamic.

  • Static attributes are hard-coded strings that don’t change over time. Could be used to distinguish metrics from DEV, TEST and PROD Sense environments.
  • Dynamic attributes may change each time Butler SOS is started, or even more often in the future if/when more dynamic attributes are added.
    An example is the Butler SOS version, which will change when Butler SOS is upgraded to a new version.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Uptime monitor
  uptimeMonitor:
    enable: true                    # Should uptime messages be written to the console and log files?
    frequency: every 15 minutes     # https://bunkat.github.io/later/parsers.html#text
    logLevel: verbose               # Starting at what log level should uptime messages be shown in console log and log files?
    storeInInfluxdb: 
      butlerSOSMemoryUsage: true    # Should data on Butler SOS' own memory use be stored in Infludb?
      instanceTag: PROD             # Tag that can be used to differentiate data from multiple Butler SOS instances
    storeNewRelic:
      enable: true
      destinationAccount:
        - Ptarmigan Labs NR account
      metric:
        dynamic:
          butlerMemoryUsage:
            enable: true            # Should Butler SOS' memory/RAM usage be sent to New Relic?
          butlerUptime:
            enable: true            # Should Butler SOS' uptime (how long since it was started) be sent to New Relic?
      attribute: 
        static:                     # Static attributes/dimensions to attach to the data sent to New Relic.
          - name: metricType
            value: butler-sos-uptime
          - name: qs_service
            value: butler-sos
          - name: qs_environment
            value: prod
        dynamic:
          butlerVersion: 
            enable: true            # Should the Butler SOS version be included in the data sent to New Relic?
  ...
  ...

2.3.7 - Credentials to third party services

What’s this?

Butler SOS can interact with certain third party services, such as New Relic.
These services typically require some kind of authentication with associated credentials (username, password etc).
Those credentials are stored in the Butler-SOS.thirdPartyToolsCredentials section of the config file.

New Relic

Zero, one or more New Relic accounts with their respective credentials can be specified.
These accounts can then be used by Butler SOS’ various features.

Note that different Butler SOS features can send their data to different New Relic accounts.
This is specified in each feature’s section in the YAML config file.

Example:

  • Sense user events are sent to First NR account
  • Sense log events are sent to Second NR account
  • Sense RAM usage is sent to both First NR account and Second NR account

Settings in main config file

---
Butler-SOS:
  ...
  ...
  # Credentials for third party systems that Butler SOS integrate with.
  # These can also be specified via command line parameters when starting Butler SOS. 
  # Command line options takes precedence over settings in this config file.
  thirdPartyToolsCredentials:
    newRelic:         # Array of New Relic accounts/insert keys.
      - accountName: First NR account
        insertApiKey: <API key 1 (with insert permissions) from New Relic> 
        accountId: <New Relic account ID 1>
      - accountName: Second NR account
        insertApiKey: <API key 2 (with insert permissions) from New Relic> 
        accountId: <New Relic account ID 2>
  ...
  ...

2.3.8 - Configuring user events

What’s this?

User events are among the most detailed bits of information retrieved from Sense by Butler SOS.
They capture session start/stop events (=users logging in/out) and connection open/close events (apps opened/closed).

These events rely on two things to be correctly configured:

  1. Settings in main config file.
  2. Log appender XML files being deployed on the Sense servers where user activity events should be captured.

Both are described below.

Tech deep-dive

The user events are created by hooking into Sense’s logging framework, which is called Log4Net.

By placing a carefully crafted XML file in the Qlik Sense proxy service’s configuration directory, we can instruct Log4Net to forward Sense log events that we are interested in to Butler SOS.
In this case we are interested in session start/stop and connection open/close events.

The XML file is also known as a “log appender file”. It contains instructions that tell Log4Net to do various things when the specified filter matches the actual log data. Examples include sending emails, writing log entries to disk (i.e. regular file logging!), sending the log row as a UDP message and more.
Here we’re interested in the UDP message feature.

So, by means of a log appender file we tell Log4Net to send certain log rows to Butler SOS as UDP messages.
We also have to specify in the log appender file what host/IP address and port Butler SOS listens to, i.e. where the UDP messages should be sent.
Finally we have to make sure firewalls are open and allow UDP traffic from the Sense server(s) to Butler SOS.

If everything is set up correctly UDP messages will arrive at Butler SOS within seconds after the actual event taking place.

Tagging of data

New Relic

The following attributes (which is New Relic lingo for tags) are added:

  1. A core set of attributes are added to all user events
    1. qs_host: Host name of the Sense server the event originated at.
    2. qs_event_action: What kind of user event that took place. Examples are “Start session”, “Stop session, “Open connection”, “Close connection”.
    3. qs_userFull: Full directory/user ID of the user the event is about. Will be scrambled if scrambling enabled in config file.
    4. qs_userDirectory: User directory of the user the event is about. Will be scrambled if scrambling enabled in config file.
    5. qs_userId: User ID of the user the event is about. Will be scrambled if scrambling enabled in config file.
    6. qs_origin: What kind of activity caused the event, for example “AppAccess”. Can be empty for some user events.
  2. Custom attributes defined in the Butler SOS config file’s Butler-SOS.userEvents.tags section.

Note: Attributes defined further down in the list above will overwrite already defined attributes if their names match.
To avoid problems you should make sure not to use already defined attributes.

Settings in main config file

---
Butler-SOS:
  ...
  ...
  # Track individual users opening/closing apps and starting/stopping sessions. 
  # Requires log appender XML file(s) to be added to Sense server(s).
  userEvents:                       
    enable: false
    excludeUser:                    # Optional blacklist of users that should be disregarded when it comes to user events
      - directory: LAB
        userId: testuser1
      - directory: LAB
        userId: testuser2
    udpServerConfig:
      serverHost: <IP or FQDN>      # Host/IP where user event server will listen for events from Sense
      portUserActivityEvents: 9997  # Port on which user event server will listen for events from Sense
    tags:                           # Tags are added to the data before it's stored in InfluxDB
      - tag: env
        value: DEV
      - tag: foo
        value: bar
    sendToMQTT: 
      enable: false                 # Set to true if user events should be forwarded as MQTT messages
      postTo:                       # Control when and to which MQTT topics messages are sent 
        everythingTopic:            # Topic to which all user events are sent
          enable: true
          topic: qliksense/userevent
        sessionStartTopic:          # Topic to which "session start" events are sent
          enable: true
          topic: qliksense/userevent/session/start
        sessionStopTopic:           # Topic to which "session stop" events are sent
          enable: true
          topic: qliksense/userevent/session/stop
        connectionOpenTopic:        # Topic to which "connection open" events are sent
          enable: true
          topic: qliksense/userevent/connection/open
        connectionCloseTopic:       # Topic to which "connection close" events are sent
          enable: true
          topic: qliksense/userevent/connection/close
    sendToInfluxdb:
      enable: true                  # Set to true if user events should be stored in InfluxDB
    sendToNewRelic:
      enable: false                  # Should log events be sent to New Relic?
      destinationAccount:
        - First NR account
        - Second NR account
      scramble: true                # Should user info (user directory and user ID) be scrambled before sent to NR?
  ...
  ...

Log appender XML files

A template log appender file LocalLogConfig.xml is available in the /docs/log4net_user-audit-event folder in the GitHub repo. It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>

    <!-- Log appender finding user session events -->
    <appender name="EventSession" type="log4net.Appender.UdpAppender">
        <filter type="log4net.Filter.StringMatchFilter">
            <param name="stringToMatch" value="Start session for user" />
        </filter>
        <filter type="log4net.Filter.StringMatchFilter">
            <param name="stringToMatch" value="Stop session for user" />
        </filter>
        <filter type="log4net.Filter.DenyAllFilter" />
        <param name="remoteAddress" value="192.168.1.168" />
        <param name="remotePort" value="9997" />
        <param name="encoding" value="utf-8" />
        <layout type="log4net.Layout.PatternLayout">
            <converter>
                <param name="name" value="hostname" />
                <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.HostNamePatternConverter" />
             </converter>
            <param name="conversionpattern" value="/qseow-proxy-session/;%hostname;%property{Command};%property{UserDirectory};%property{UserId};%property{Origin};%property{Context};%message" />
        </layout>
    </appender>

    <!-- Log appender finding user connection events -->
    <appender name="EventConnection" type="log4net.Appender.UdpAppender">
        <filter type="log4net.Filter.StringMatchFilter">
            <param name="stringToMatch" value="connection Opened for session" />
        </filter>
        <filter type="log4net.Filter.StringMatchFilter">
            <param name="stringToMatch" value="connection Closed for session" />
        </filter>
        <filter type="log4net.Filter.DenyAllFilter" />
        <param name="remoteAddress" value="192.168.1.168" />
        <param name="remotePort" value="9997" />
        <param name="encoding" value="utf-8" />
        <layout type="log4net.Layout.PatternLayout">
            <converter>
                <param name="name" value="hostname" />
                <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.HostNamePatternConverter" />
             </converter>
            <param name="conversionpattern" value="/qseow-proxy-connection/;%hostname;%property{Command};%property{UserDirectory};%property{UserId};%property{Origin};%property{Context};%message" />
        </layout>
    </appender>

    <!-- Send UDP message to Butler SOS on user activity -->
    <logger name="AuditActivity.Proxy">
        <appender-ref ref="EventSession" />
        <appender-ref ref="EventConnection" />
    </logger>
</configuration>

Tip

If you have several servers in your Sense cluster you probably need several log appender files too.

More specifically, you should put a log appender file on each server where the Qlik Sense proxy service is running, i.e. on all servers via which end users access the Sense cluster.

Note the places where you need to fill in the IP/host where Butler SOS is running, as well as the port number to use (set to 9997 but can be changed if needed).

Make necessary changes so the file matches your environment, then deploy to C:\ProgramData\Qlik\Sense\Proxy\LocalLogConfig.xml. Note that the file must be called LocalLogConfig.xml!

Sense will usually detect and use the file without any restarts needed, but it can take a while. You can always restart the Sense proxy service to make sure the XML file is applied and used.

Once in place you should see events in the Butler SOS console log if you set logging level to verbose, debug or silly.

2.3.9 - Configuring log events

What’s this?

Butler SOS log events are designed to be a replacement for the most important/useful aspects of Qlik Sense’ log database, which was removed from Qlik Sense Enterprise on Windows in mid 2021.

They capture warnings, errors and fatals from the various QSEoW subsystems.
These events used to be sent to the PostgreSQL logging database, most (but not all) are also sent to QSEoW’s log files.

Using Butler SOS’ log events is arguably even better than getting the same information from log db:
Log db had to be polled to detect new log events and this polling could realistically only be done every few minutes. It also put additional load on an often already struggling part of many QSEoW clusters.

With Butler SOS’ log events concept the notifications are almost instantaneous. Errors and warnings show up in the Grafana or New Relic dashboards within seconds after taking place in QSEoW.

Log events rely on two things to work:

  1. Settings in the Butler SOS main config file.
  2. Log appender XML files being deployed on the Sense servers where log events should be captured.

Both are described below.

Info

As of Butler SOS version 9.2, log events are captured in these QSEoW services:

  • Engine
  • Proxy
  • Repository
  • Scheduler

Support for additional modules is reasonably easy to add, please create a ticket if you believe some service should be added to the list above.

Tech deep-dive

The underlying mechanism is the same as described on the user events page.

Tagging of data

New Relic

The following attributes (which is New Relic lingo for tags) are added:

  1. A core set of attributes are added to all user events. Note that some attributes will be empty for some/many log events.
    1. qs_ts_iso: Event timestamp in ISO format.
    2. qs_ts_local: Event timestamp in local (server) time zone.
    3. qs_log_source: Which Sense service the event originated in, for example “qseow-proxy”, “qseow-repository”.
    4. qs_log_level: Log level of the event. “WARN”, “ERROR”, or “FATAL”.
    5. qs_host: Host name of the Sense server the event originated at.
    6. qs_subsystem: Which part of each Sense service the event originated in, for example “System.Proxy.Proxy.Core.RequestListener”, “System.Engine.Engine”.
    7. qs_windows_user: Name of the Windows user that’s used to run the Window service where the event originated.
    8. qs_message: Event message.
    9. qs_exception_message: Additional information about the event.
    10. qs_user_full: Full directory/user ID of the user the event is about. Will be scrambled if scrambling enabled in config file.
    11. qs_user_directory: User directory of the user the event is about. Will be scrambled if scrambling enabled in config file.
    12. qs_user_id: User ID of the user the event is about. Will be scrambled if scrambling enabled in config file.
    13. qs_command: What command (if any) caused the event. Example: “Doc::DoSave”, “Doc::CreateObject”.
    14. qs_result_code: Result code as reported by Sense. Usually empty.
    15. qs_origin: What kind of activity caused the event, for example “AppAccess”.
    16. qs_context: Additional information about the event.
    17. qs_task_name: Task name (if any) causing the event.
    18. qs_app_name: App name (if any) causing the event.
    19. qs_task_id: Task ID (if any) causing the event.
    20. qs_app_id: App ID (if any) causing the event.
    21. qs_execution_id: Execution ID as reported by Sense.
    22. qs_proxy_session_id: Proxy session ID as reported by Sense.
    23. qs_engine_ts: Engine timestamp (if any) associated with the event.
    24. qs_process_id: Process ID of engine service.
    25. qs_engine_exe_version: Version of engine service’s EXE file.
    26. qs_server_started: Timestamp when Sense engine service was started.
    27. qs_entry_type: Entry type as reported by Sense. Usually empty.
    28. qs_session_id: Session ID as reported by Sense.
  2. Custom attributes defined in the Butler SOS config file’s Butler-SOS.logEvents.tags section.
  3. Custom attributes defined in the Butler SOS config file’s Butler-SOS.newRelic.event.attribute.static section.
  4. Dynamic attributes
    1. butlerSosVersion: Butler SOS version. Enabled by setting Butler-SOS.newRelic.event.attribute.dynamic.butlerSosVersion.enable to true in config file.

Note: Attributes defined further down in the list above will overwrite already defined attributes if their names match.
To avoid problems you should make sure not to use already defined attributes.

Settings in main config file

Tip

The config snippet below comes from the production_template.yaml file.

Being a template, it contains examples on how configuration may be done - not necessarily how it should be done.
For example, the env/DEV and foo/bar tags are optional and can be changed to something else, or removed all together if not used.

---
Butler-SOS:
  ...
  ...
  # Log events are used to capture Sense warnings, errors and fatals in real time
  logEvents:
    udpServerConfig:
      serverHost: <IP or FQDN>      # Host/IP where log event server will listen for events from Sense
      portLogEvents: 9996           # Port on which log event server will listen for events from Sense
    tags:
      - tag: env
        value: DEV
      - tag: foo
        value: bar
    source:
      engine:
        enable: false                  # Should log events from the engine service be handled?
      proxy:
        enable: false                  # Should log events from the proxy service be handled?
      repository:
        enable: false                  # Should log events from the repository service be handled?
      scheduler:
        enable: false                  # Should log events from the scheduler service be handled?
    sendToMQTT: 
      enable: false                    # Should log events be sent as MQTT messages?
      baseTopic: qliksense/logevent    # What topic should log events be forwarded to? 
      postTo:
        baseTopic: true
        subsystemTopics: true          # Should log events be sent to subtopics corresponding to the QSEoW subsystems where the events originated?
    sendToInfluxdb:
      enable: false                    # Should log events be stored in InfluxDB?
    sendToNewRelic:
      enable: false                    # Should log events be sent to New Relic?
      destinationAccount:
        - First NR account
        - Second NR account
      source:
        engine:
          enable: true                 # Should log events from the engine service be handled?
          logLevel: 
            error: true                # Should error level log events be handled by Butler SOS?
            warn: true                 # Should warning level log events be handled by Butler SOS?
        proxy:
          enable: true                 # Should log events from the proxy service be handled?
          logLevel: 
            error: true                # Should error level log events be handled by Butler SOS?
            warn: true                 # Should warning level log events be handled by Butler SOS?
        repository:
          enable: true                 # Should log events from the repository service be handled?
          logLevel: 
            error: true                # Should error level log events be handled by Butler SOS?
            warn: true                 # Should warning level log events be handled by Butler SOS?
        scheduler:
          enable: true                 # Should log events from the scheduler service be handled?
          logLevel: 
            error: true                # Should error level log events be handled by Butler SOS?
            warn: true                 # Should warning level log events be handled by Butler SOS?
  ...
  ...

Log appender XML files

Template/sample log appender files are available in the /docs/log_appenders folder in the GitHub repo.

Note that the log appender files contain slightly different information for each Sense service (engine/proxy/repository/scheduler)!
Also keep in mind that the log appender files must be called LocalLogConfig.xml and placed in these directories on the all Sense servers:

  • C:\ProgramData\Qlik\Sense\Engine
  • C:\ProgramData\Qlik\Sense\Proxy
  • C:\ProgramData\Qlik\Sense\Repository
  • C:\ProgramData\Qlik\Sense\Scheduler

Tip

If you have more than one Sense server you strictly speaking don’t have to deploy log appenders to all servers.

If you are only interested in receiving log events from some servers and/or services (engine, proxy, repository, scheduler) - deploy the log appender files there.

2.3.10 - Configuring the log database

What’s this?

Up until mid 2021 Qlik Sense Enterprise on Windows included a logging database to which log events were sent. It was removed from the product due to mainly performance reasons - it could be difficult to scale properly for large Sense clusters.

Butler SOS offers a replacement for log db, in the form of log events.

There may cases where using log db is still preferred though, for example for Sense cluster running older versions of QSEoW. For that reason it’s still possible to configure and use log db from Butler SOS.

Warning

Future versions of Butler SOS are likely to remove support for log db.
It can thus be a good idea to migrate to Butler SOS log events already now.

The Sense log db contains a subset of the log events that appear in Sense’s log files on disk.
Not all Sense subsystems send logs to log db, but those who do typically sent info, warnings, errors and fatal messages to the database. The logging level is configured in the QMC.

The log db integration rely on one thing to work:

  1. Settings in the Butler SOS main config file.

These settings are described below.

Tech deep-dive

  • Butler SOS queries log db with a certain interval, set by the pollingInterval setting.
  • Each query will get log events going back a certain time, controlled by the queryPeriod setting.
    • queryPeriod should be longer than the polling interval! If it’s not there is a risk that log events won’t be retrieved to Butler SOS.

Settings in main config file

Tip

The config snippet below comes from the production_template.yaml file.

Being a template, it contains examples on how configuration may be done - not necessarily how it should be done.
For example, the env/DEV and foo/bar tags are optional and can be changed to something else, or removed all together if not used.

Warning

Enabling info level logs in the query done towards log db will result in lots of log events being stored in Butler SOS’ InfluxDB database. This can cause performance issues in large Sense environments.

---
Butler-SOS:
  ...
  ...
  # Qlik Sense logging db config parameters
  logdb:
    enable: true
    # Items below are mandatory if logdb.enable=true
    pollingInterval: 60000            # How often (milliseconds) should Postgres log db be queried for warnings and errors?
    queryPeriod: 5 minutes            # How far back should Butler SOS query for log entries? Default is 5 min
    host: <IP or FQDN of Qlik Sense logging db>   # E.g. 10.5.23.7 or sense.mycompany.com
    port: 4432                        # 4432 if using default Sense setup 
    qlogsReaderUser: qlogs_reader
    qlogsReaderPwd: <pwd>
    extractErrors: true               # Should error level entries be extracted from log db into Influxdb?
    extractWarnings: true             # Should warn level entries be extracted from log db into Influxdb?
    extractInfo: false                # Should info level entries be extracted from log db into Influxdb? 
                                      # Warning! Seting this to true will result in LOTS of log messages 
                                      # being retrrieved by Butler SOS!
  ...
  ...

2.3.11 - Connecting to a Qlik Sense server

Details on how to configure the connection from Butler SOS to Qlik Sense Enterprise on Windows.

What’s this?

In order to interact with a Qlik Sense Enterprise on Windows (QSEoW) environment, Butler SOS needs to know a few things about that environment. This is true no matter if the Sense cluster consists of a single Sense server or many.

Settings in main config file

---
Butler-SOS:
  ...
  ...
  # Certificates to use when connecting to Sense. Get these from the Certificate Export in QMC.
  cert:
    clientCert: <path/to/cert/client.pem>
    clientCertKey: <path/to/cert/client_key.pem>
    clientCertCA: <path/to/cert/root.pem>
    clientCertPassphrase: <certificate key password, if one was specified when exporting certificates from Sense QMC >
    # If running Butler in a Docker container, the cert paths MUST be the following
    # clientCert: /nodeapp/config/certificate/client.pem
    # clientCertKey: /nodeapp/config/certificate/client_key.pem
    # clientCertCA: /nodeapp/config/certificate/root.pem
    # clientCertPassphrase: 
  ...
  ...

Qlik Sense certificates

Butler SOS uses certificates to authenticate with Qlik Sense.
These certificates must be exported from the Qlik Management Console (QMC).

Qlik Sense certificate export

To export certificates you need to provide a few pieces of information:

  1. The IP or full host name that Butler SOS’ will use when calling Butler SOS APIs. For example, if Butler SOS get data from server1.my.domain (i.e. the config setting Butler.serverToMonitor.server.host is set to server1.my.domain), the value server1.my.domain should be entered as “machine name” when exporting the certificates from the QMC.
  2. You only need to export certificate from one server in multi-server Sense clusters. The exported certificate can be used to access and get data from any server in the cluster.
  3. Butler SOS can handle certificates with or without password protection. If you choose to use a password, you must enter that password in the Butler SOS config file.
  4. Check the “Include secret key” check box.
  5. Export certificates in PEM format.

Qlik Sense certificate export

Then click the “Export certificates” button. If all goes well the certificates are now exported to a folder on the Sense server to which you are connected (i.e. the server hosting the virtual proxy you are connected to):

Qlik Sense certificate export

The exported certificate files will be used when configuring Butler SOS.

2.3.12 - Setting up MQTT messaging

Butler SOS can use MQTT as a channel for pub-sub style M2M (machine to machine) messages. This page describes how to configure MQTT in Butler SOS.

What’s this?

MQTT is a light weight messaging protocol based on a publish-subscribe metaphore. It is widely used in Internet of Things and telecom sectors.

MQTT has features such as guaranteed delivery of messages, which makes it very useful for communicating between Sense and both up- and downstream source/destination systems.

Butler SOS can be configured to forward various metrics and events from Sense as MQTT messages. In order to do so, some shared configuration needs to be in place first. This section covers that configuration.

Specifically, a MQTT broker/gateway has to be configured. All MQTT messages from Butler SOS will be sent to this broker.

Settings in main config file

Butler-SOS:
  ...
  ...
  # MQTT config parameters
  mqttConfig:
    enable: false
    # Items below are mandatory if mqttConfig.enable=true
    brokerHost: <IP of MQTT broker/server>
    brokerPort: 1883
    baseTopic: butler-sos/          # Default topic used if not not oherwise specified elsewhere. Should end with /
  ...
  ...

2.3.13 - Setting up the New Relic integration

Butler SOS can send metrics and events to New Relic.
This way it’s possible use their SaaS solution for storing and visualising Butler SOS data.

What’s this?

New Relic offers a suite of online/SaaS products that collectively form a very complete obeservability stack.

From a Butler SOS perspective the interesting parts are metrics, event and log handling.
By forwarding such data to New Relic it’s not necessary to run local InfluxDB and Grafana instances.

That said, New Relic is a commercial service and while their free tier is very generous, there will be a trade-off between a local/lower cost InfluxDB/Grafana setup and using New Relic.
With InfluxDB/Grafana you also get more fine grained control over both data storage and visualisations, while New Relic offer ease of setup and no need to host InfluxDB/Grafana yourself.

Below the settings for sending Qlik Sense health metrics to New Relic are described.

Tagging of data

The following attributes (which is New Relic lingo for tags) are added.

Note: Attributes defined further down in the list will overwrite already defined attributes if their names match.
To avoid problems you should make sure not to use already defined attributes.

Tags for Qlik Sense health metrics

  1. Static attributes defined in the config file’s Butler-SOS.newRelic.metric.attribute.static section.
  2. Dynamic attributes
    1. butlerSosVersion: Butler SOS version
  3. Dynamic attributes based on whether each item in Butler-SOS.newRelic.metric.dynamic section is enabled/disabled.
    1. If Butler-SOS.newRelic.metric.dynamic.engine.memory section is enabled
      1. qs_memCommited
      2. qs_memAllocated
      3. qs_memFree
    2. If Butler-SOS.newRelic.metric.dynamic.engine.cpu section is enabled
      1. qs_cpuTotal
    3. If Butler-SOS.newRelic.metric.dynamic.engine.calls section is enabled
      1. qs_engineCalls
    4. If Butler-SOS.newRelic.metric.dynamic.engine.selections section is enabled
      1. qs_engineSelections
    5. If Butler-SOS.newRelic.metric.dynamic.engine.sessions section is enabled
      1. qs_engineSessionsActive
      2. qs_engineSessionsTotal
    6. If Butler-SOS.newRelic.metric.dynamic.engine.users section is enabled
      1. qs_engineUsersActive
      2. qs_engineUsersTotal
    7. If Butler-SOS.newRelic.metric.dynamic.engine.saturated section is enabled
      1. qs_engineSaturated
    8. If Butler-SOS.newRelic.metric.dynamic.apps.docCount section is enabled
      1. qs_docsActiveCount
      2. qs_docsLoadedCount
      3. qs_docsInMemoryCount
    9. If Butler-SOS.newRelic.metric.dynamic.cache.cache section is enabled
      1. qs_cacheHits
      2. qs_cacheLookups
      3. qs_cacheaAdded
      4. qs_cacheReplaced
      5. qs_cacheBytesAdded

Tags for Qlik Sense proxy session metrics

If and how Butler SOS should extract proxy session metrics is controlled in the Butler-SOS.userSessions section of the config file.

  1. Static attributes defined in the config file’s Butler-SOS.newRelic.metric.attribute.static section.
  2. Dynamic attributes
    1. butlerSosVersion: Butler SOS version

Settings in main config file

Butler-SOS:
  ...
  ...
  # New Relic config
  # If enabled, select Butler SOS metrics will be sent to New Relic.
  newRelic:
    enable: false
    event:
      # There are different URLs depending on whther you have an EU or US region New Relic account.
      # The available URLs are listed here: https://docs.newrelic.com/docs/accounts/accounts-billing/account-setup/choose-your-data-center/
      #
      # Note that the URL path should *not* be included in the url setting below!
      # As of this writing the valid options are
      # https://insights-collector.eu01.nr-data.net
      # https://insights-collector.newrelic.com 
      url: https://insights-collector.eu01.nr-data.net
      header:                   # Custom http headers
        - name: X-My-Header
          value: Header value
      attribute: 
        static:                 # Static attributes/dimensions to attach to the events sent to New Relic.
          - name: service
            value: butler-sos
          - name: environment
            value: prod
        dynamic:
          butlerSosVersion: 
            enable: true       # Should the Butler SOS version be included in the events sent to New Relic?
    metric:
      destinationAccount:
        - First NR account
        - Second NR account
      # There are different URLs depending on whther you have an EU or US region New Relic account.
      # The available URLs are listed here: https://docs.newrelic.com/docs/accounts/accounts-billing/account-setup/choose-your-data-center/
      # As of this writing the options for the New Relic metrics API are
      # https://insights-collector.eu01.nr-data.net/metric/v1
      # https://metric-api.newrelic.com/metric/v1 
      url: https://insights-collector.eu01.nr-data.net/metric/v1   # Where should uptime data be sent?
      header:                   # Custom http headers
        - name: X-My-Header
          value: Header value
      dynamic:
        engine:
          memory:               # Engine RAM (free/committed/allocated).
            enable: true
          cpu:                  # Engine CPU.
            enable: true        
          calls:                # Total number of requests made to the engine.
            enable: true        
          selections:           # Total number of selections made to the engine.
            enable: true        
          sessions:             # Engine session metrics (active and total number of engine sessions).
            enable: true        
          users:                # Engine user metrics (active and total number of users in engine.
            enable: true        
          saturated:            # Engine saturation status (tracks whether engine has high or low load).
            enable: true
        apps:
          docCount:
            enable: true
        cache:
          cache:                # Cache metrics.
            enable: true
        proxy:
          sessions:             # Session metrics as reported by the Sense proxy service
            enable: true
      attribute: 
        static:                 # Static attributes/dimensions to attach to the data sent to New Relic.
          - name: service
            value: butler-sos
          - name: environment
            value: prod
        dynamic:
          butlerSosVersion: 
            enable: true       # Should the Butler SOS version be included in the data sent to New Relic?
  ...
  ...

2.3.14 - Setting up Prometheus

Butler SOS can store metrics in Prometheus.

What’s this?

Prometheus is the de-facto standard, open source tool for achieving observability of both small, large and huge IT systems.

At its heart Prometheus contains a time-series databas optimized for storing various kinds of measurements. It has strong support for doing dimensional queries, great integrations with incident managament tools and more.

Looking at the visualisation side of things, Prometheus is Grafana’s preferred source for time-series data. Put differently, Prometheus has some query features that InfluxDB lack, thus making some Grafana diagrams easier to create using Prometheus vs InfluxDB. The difference is minor though.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Prometheus config
  # If enabled, select Butler SOS metrics will be exposed on a Prometheus compatible URL from where they can be scraped.
  prometheus:
    enable: false                                    # Default false
    host: <IP or FQDN where Butler SOS is running>  # On what IP/FQDN should the Prometheus metrics be exposed? Default 0.0.0.0, i.e. all available IPs
    port: 9842      
  ...
  ...

2.3.15 - Setting up InfluxDB time series database

Butler SOS can store metrics in InfluxDB.

Warning

Butler SOS was developed with InfluxDB version 1.x in mind.

InfluxDB is currently available in version 2.x and while this version brings lots of new goodies, it’s not out-of-the-box compatible with Butler SOS.
For that reason you should use the latest 1.x version of InfluxDB, which at the time of this writing is 1.8.4.

In due time Butler SOS will be updated to support InfluxDB 2.x too.

What’s this?

InfluxDB is a time series database. This means it is optimised for storing data that’s somehow linked to a timestamp.
Measurements and metrics are some of the most obvious kinds of data for which InfluxDB was created.

Butler SOS stores data in InfluxDB in full detail, i.e. Butler SOS doesn’t do any aggregation of older data points.
This has a few consequences:

  • If you are monitoring many Sense servers and/or query Sense for health metrics very frequently and/or have a long InfluxDB retention policy (many months or even years) you will eventually end up with lots of data.
  • You should ask yourself how far back you need to look at operational data such as the one collected by Butler SOS. In most cases 30 or 45 days history will be more than enough. 10-14 days are usually a good starting point. Use the Butler-SOS.influxdbConfig.retentionPolicy section of the config file to create a retention policy for Butler SOS.
  • If you need longer history you should consider using InfluxDB’s excellent aggregation features. These can assist in aggregating older data points, with the effect that you can then keep virtually unlimited history. The older data will not be as detailed (fewer samples per time period) - but you will still have an averaged view of what the history looked like.

Note 1: Instructions for how to aggregate old data is beyond the scope of this documentation.

Note 2: The retention policy specified in the config file will only be created if the InfluxDB database specified in the config file does NOT exist when Butler SOS is started.
I.e. if you store data to an existing InfluxDB database, the retention policy will not be created.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Influx db config parameters
  influxdbConfig:
    enable: true
    # Items below are mandatory if influxdbConfig.enable=true
    hostIP: <IP or FQDN of Influxdb server>
    hostPort: <Port where Influxdb is listening>    # Optional. Default value=8086
    auth:
      enable: false                 # Does influxdb instance require authentication (true/false)?
      username: <username>          # Username for Influxdb authentication. Mandatory if auth.enable=true
      password: <password>          # Password for Influxdb authentication. Mandatory if auth.enable=true
    dbName: SenseOps

    # Default retention policy that should be created in InfluxDB when Butler SOS creates a new database there. 
    # Any data older than retention policy threshold will be purged from InfluxDB.
    retentionPolicy:
      name: 10d
      duration: 10d                 # Possible duration units here: https://docs.influxdata.com/influxdb/v1.8/query_language/spec/#durations 

    # Control whether certain metrics are stored in InfluxDB or not
    # Use with caution! Enabling activeDocs, loadedDocs or inMemoryDocs may result in lots of data sent to InfluxDB.
    includeFields:
      activeDocs: false              # Should data on what docs are active be stored in Influxdb (true/false)? 
      loadedDocs: false              # Should data on what docs are loaded be stored in Influxdb (true/false)?
      inMemoryDocs: false            # Should data on what docs are in memory be stored in Influxdb (true/false)?
  ...
  ...

2.3.16 - Configuring extraction of app names from Qlik Sense

What’s this?

Qlik Sense’s APIs return all its app metrics relative to an app ID, not the app name.
This is fine as the ID is guaranteed to be unique, but the downside is that the ID doesn’t tell us humans much.

Butler SOS therefore provides an app ID-to-app name mapping with a configurable update interval.
The Butler-SOS.appNames section of the config file controls if this mapping should be done at all, how often and which Sense server should be used to get it.

A more comprehensive description of Butler SOS’ strategy for getting correct names of Sense apps is available in the Concepts section.

Settings in main config file

Butler-SOS:
  ...
  ...
  # Extract app names
  appNames: 
    enableAppNameExtract: true    # Extract app names in addition to app IDs (tue/false)?
    extractInterval: 60000        # How often (milliseconds) should app names be extracted?
    hostIP: <IP or FQDN>          # What Sense server should be queried for app names?
  ...
  ...

Setting anonTelemetry to true enables telemetry, setting it to false disables telemetry.

2.3.17 - Configuring user sessions

What’s this?

A description of what user sessions are is available in the Concepts section.

Detailed user session metrics are retrieved for all virtual proxies specified in the Butler-SOS.serversToMonitor.servers[].userSessions.VirtualProxies[] array.

In other words: For each monitored Sense server it is possible to specify which of the server’s proxy service’s virtual proxies should be monitored with respect to per-user session metrics.
Right, that’s a long sentence…

Let’s try again: For each monitored Sense server, decide which virtual proxies should be monitored.
Enter those proxies in the Butler-SOS.serversToMonitor.servers[].userSessions.VirtualProxies[] array for the server in question.

Note

In order to get detailed, per-user and virtual proxy session info you need to

  1. Configure the Butler-SOS.userSessions section of the config file with general parameters about how often sessions should be polled, user blacklist etc.
    Don’t forget to set Butler-SOS.userSessions.enableSessionExtract to true.
  2. For each server then set Butler-SOS.serversToMonitor.servers[].userSessions.enable to true and specify which virtual proxies should be monitored.

You will only get user session info if you configure both the points above.

Settings in main config file

Tip

The config snippet below comes from the production_template.yaml file.

Being a template, it contains examples on how configuration may be done - not necessarily how it should be done.
For example, the LAB/testuser1 and LAB/testuser2 user are optional and can be changed to something else, or removed all together if not used.

Butler-SOS:
  ...
  ...
  # Sessions per virtual proxy
  userSessions:
    enableSessionExtract: true      # Query unique user IDs of what users have sessions open (true/false)?
    # Items below are mandatory if enableSessionExtract=true    
    pollingInterval: 30000        # How often (milliseconds) should session data be polled?
    excludeUser:                  # Optional blacklist of users that should be disregarded when it comes to session monitoring.
                                  # Blacklist is only applied to data in InfluxDB. All session data will be sent to MQTT.
      - directory: LAB
        userId: testuser1
      - directory: LAB
        userId: testuser2
  ...
  ...

2.3.18 - Configure which Sense servers to monitor

What’s this?

This part of the config file contains information on what Sense servers should be monitored and details about those servers.

Note that there is a dependency to the Butler-SOS.userSessions section. Please see the Configuring user sessions page for more info.

Tags

It’s possible to define server specific tags that will be stored together with the Sense metrics in InfluxDB.
The tags can then be used when creating Grafana dashboards, for example to distinguish between DEV, TEST and PROD servers, where servers are located physically etc.

You can define zero or more tags in the Butler-SOS.serversToMonitor.serverTagsDefinition section.
Those tags are then given values for each server.

If a tag is defined it must in fact be given a value for each server!
If nothing else just set it to an empty string or a hyphen, ‘-’.

Settings in main config file

Tip

The config snippet below comes from the production_template.yaml file.

Being a template, it contains examples on how configuration may be done - not necessarily how it should be done.
For example, the list serverTagsDefinition is optional and can be changed to something else, or removed all together if not used.

Same thing for the list of servers and virtual proxies. Update them so they match your own Sense environment.

Butler-SOS:
  ...
  ...
  serversToMonitor:
    pollingInterval: 30000          # How often (milliseconds) should the healthcheck API be polled?

    # If false, Butler SOS will accept TLS certificates on the server without verifying them with the CA. 
    # If true, data will only be retrieved from the Sense server if that server's TLS cert verifies 
    # successfuully against the list of CAs available on the computer where Butler SOS is running.
    rejectUnauthorized: true 

    # List of extra tags for each server. Useful for creating more advanced Grafana dashboards.
    # Each server below MUST include these tags in its serverTags property.
    # The tags below are just examples - define your own as needed
    # These tags will also be used to label data exposed on the Prometheus endpoint (if it is enabled)
    # NOTE: Prometheus only allows label names consisting of ASCII letters, numbers, as well as underscores. They must match the regex [a-zA-Z_][a-zA-Z0-9_]*. 
    # I.e. if the Prometheus endpoint is enabled, the tag names below must follow the label naming standard of Prometheus. 
    serverTagsDefinition: 
      - server_group
      - serverLocation
      - server_type
      - serverBrand

    # Sense Servers that should be queried for healthcheck data 
    servers:
      - host: <server1.my.domain>:4747        # Example: 10.34.3.45:4747
        serverName: <server1>
        serverDescription: <description>
        logDbHost: <host name as used in QLogs db>
        userSessions:
          enable: true
          # Items below are mandatory if userSessions.enable=true
          host: <server1.my.domain>:4243      # Example: 10.34.3.45:4243
          virtualProxies:
            - virtualProxy: /                 # Default virtual proxy
            - virtualProxy: /hdr              # "hdr" virtual proxy
            - virtualProxy: /sales            # "sales" virtual proxy
        serverTags:
          server_group: DEV
          serverLocation: Asia
          server_type: virtual
          serverBrand: Dell
        headers: 
          X-My-Header-1: Header value 1
          X-My-Header-2: Header value 2
      - host: <server2.my.domain>:4747        # Example: 10.34.3.46:4747
        serverName: <server2>
        serverDescription: <description>
        logDbHost: <host name as used in QLogs db>
        userSessions:
          enable: true
          # Items below are mandatory if userSessions.enable=true
          host: <server2.my.domain>:4243      # Example: 10.34.3.46:4243
          virtualProxies:
            - virtualProxy: /finance          # "finance" virtual proxy
        serverTags:
          server_group: PROD
          serverLocation: Europe
          server_type: physical
          serverBrand: HP
        headers: 
          X-My-Header-1: Header value 3
          X-My-Header-2: Header value 4
  ...
  ...

2.3.19 - Configuring telemetry

What’s this?

A description of Butler’s telemetry feature is available here.

Settings in main config file

---
Butler:
  # Logging configuration
  ...
  ...
  anonTelemetry: true     # Can Butler SOS send anonymous data about what computer it is running on? 
                          # More info on whata data is collected: https://butler-sos.ptarmiganlabs.com/docs/about/telemetry/
                          # Please consider leaving this at true - it really helps future development of Butler SOS!
  ...
  ...

Setting anonTelemetry to true enables telemetry, setting it to false disables telemetry.

2.4 - Day 2 operations

Options for running Butler SOS.

Running Butler SOS

How to start and keep Butler SOS running varies depending on whether you are using Docker or a native Node.js approach.

Docker

Starting Butler SOS using Docker is easy.

First configure the docker-compose.yml file as needed, then start the Docker container in interactive mode (=with output sent to the screen).
This is useful to ensure everything works as intended when first setting up Butler SOS.

docker-compose up

Once Butler SOS has been verified to work as intended, hit ctrl-c to stop it.
Then start it again in deameon (background) mode:

docker-compose up -d

From here on the Docker enviromment will make sure Butler SOS is always running, including restarting it if it for some reason stops, when server reboots etc.

Pre-built, standalone binaries

Starting Butler SOS using the pre-built binaries could look like this on Windows:

d:
cd \node\butler-sos
butler-sos.exe --configfile butler-sos-prod.yaml --loglevel info

It is of course also possible to put those commands in a command file (.bat on Windows, .sh etc on other platforms) file and execute that file instead.

As Butler SOS is the kind of service that (probably) should always be running on a server, it makes sense running it as a Windows service (or similar mechanism in Linix).

On Windows you can use the excellent Nssm tool (https://nssm.cc) to achieve this, with all the benefits that follow (the service can be monitored using operations tools, automatic restarts etc).

A step-by-step tutorial for running Butler SOS as a Windows service using NSSM is available over at ptarmiganlabs.com.

On Linux both PM2 (https://github.com/Unitech/pm2) and Forever (https://github.com/foreverjs/forever) have been successfully tested with Butler SOS.

Native Node.js

Starting Butler SOS as a Node.js on Windows could look like this:

d:
cd \node\butler-sos\src
node butler-sos.js

It is of course also possible to put those commands in a command file (.bat on Windows, .sh etc on other platforms) file and execute that file instead.

Windows services & process monitors

As Butler SOS is the kind of service that (probably) should always be running on a server, it makes sense using a Node.js process monitor to keep it alive (if running Butler SOS as a Docker container you get this for free).

On Windows you can use the excellent Nssm tool (https://nssm.cc) to make Butler SOS run as a Windows Service, with all the benefits that follow (can be monitored using operations tools, automatic restarts etc).

If running Butler SOS as a Node.js app on Linux, PM2 (https://github.com/Unitech/pm2) and Forever (https://github.com/foreverjs/forever) are two process monitors that both have been successfully tested with Butler SOS.

One caveat with these is that it can be hard to start them (and thus Butler SOS) when a Windows server is rebooted. PM2 can be used to solve this challenge in a nice way, more info in this blog post: https://ptarmiganlabs.com/blog/2017/07/12/monitoring-auto-starting-node-js-services-windows-server.
On the other hand - just using Nssm is probably the easiest and best option for Windows.

2.4.1 - Monitoring Butler SOS

Options for monitoring Butler SOS itself.

Monitoring Butler SOS

Once Butler SOS is running it’s a good idea to also monitor it. Otherwise you stand the risk of not getting notified if Butler SOS for some reason misbehaves.

Butler SOS will log data on its memory usage to InfluxDB if

  1. The config file’s Butler-SOS.uptimeMonitor.enables and Butler.uptimeMonitor.storeInInfluxdb.butlerSOSMemoryUsage properties are both set to true.
  2. The remaining InfluxDB properties of the config file are correctly configured.

Assuming everything is correctly set up, you can then create a Grafana dashboard showing Butler SOS’ memory use over time.
You can also set up alerts in Grafana with notifications going to most popular IM tools and email.

A Grafana dashboard can look like this. This particular chart is for the Butler tool, but the concept for Butler SOS is the same.

alt text

There is a sample Grafana dashboard in Butler SOS’ GitHub repo.

3 - Concepts

Deep dive into the various components that make up Butler SOS.

3.1 - Sessions & Connections

What are user sessions and why are they important?

Who are using the Sense environment right now?

This is a very basic question a sysadmin will ask over and over again.
This question is answered in great detail by the excellent Operations Monitor app that comes with Qlik Sense out of the box.
But that app gives a retrospective view of the data - it does not provide real-time insights.

Butler SOS focus on the opposite: Give as close to real-time insights as possible into what’s happening in the Sense environment.

User behaviour is then an important metric to track, more on this below.

Proxy sessions

A session, or more precicely a proxy session starts when a user logs into Sense and ends when the user logs out or the session timeout is reached.
There may be some additional corner-case variants, but the above is the basic, high-level definition of a proxy session.

Tip

Some useful lingo: The events that can happen for sessions are that they can start and stop.

They can also timeout if the user is inactive long enough (the exact time depends on settings in the QMC).

As long as a user uses the same web browser and doesn’t use incognito mode or similar, the user will reuse the same session for all access to Sense.
On the other hand: If a user uses different browsers, incognito mode etc, multiple sessions will be registered for the user in question. There is a limit to how many sessions a user can have at any given time.

Mashups and similar scenarios where Sense objects are embedded into web apps may result in multiple sessions being created. Whether or not this happens largely depends on how the mashup was created.

A good overview of what constitutes a session is found here.

Proxy session vs engine sessions

The proxy service is responsible for handling users connecting to Qlik Sense.

When a user connects a proxy session is created - assuming the user has permissions to access Sense.

When interacting with individual charts in a Sense app, engine sessions are used to interact with the engine service.

A user typically thus have one (or a few) proxy session and then a larger number of transient engine sessions.

More info here.

Connections

A user may open more than one connection within a session.

A connection is opened when the user opens an app in a browser tab, or when a user opens a mashup which in turn triggers a connection to a Sense app to be set up.
Closing the browser tab will close the connection.

Tip

Some useful lingo: The events that can happen for connections are that they can be opened and closed.

User events

Butler SOS offers a way to monitor individual session and connection events, as they happen.

This is done by Butler SOS hooking into the logging framework of Qlik Sense, which will notify Butler SOS when users start/stop sessions or connections are being open/closed.

A blacklist in the main config file provides a way to exlcude some users (e.g. system accounts) from the user event monitoring.

Configuration of user events monitoring is done in the main config file’s Butler-SOS.userEvents section.
A step-by-step instruction for setting up user event monitoring is available in the Getting started section. More info about the config file is available here.

On an aggregated level this information is useful to understand how often users log in/out, when peak ours are etc. On a detailed level this information is extremely useful when trying to understand which users had active sessions when some error occured.
Think investigations such as “who caused that 250 GB drop in RAM we were just alerted about?”.

User events in Grafana dashboard

3.2 - Apps

“Apps”, “Applications” or “Documents” - in Qlik lingo they all refer to the same thing.
It’s where data and logic is kept in the Qlik Sense system.

Tip

The use cases page contains related information that may be of interest.

App metrics

When it comes to apps we basically want to track which apps are loaded into the monitored Sense server(s).

In-memory vs loaded vs active apps

Given the caching nature of Qlik Sense, some apps will be actively used by users and some will just be sitting there without any current connections.

Sense group apps into three categories, which are all available in Butler SOS:

  • Active apps. An app is active when a user is currently performing some action on it.
  • Loaded apps. An app that is currently loaded into memory and that have open sessions or connections.
  • In-memory apps. Loaded into memory, but without any open sessions or connections.

Butler SOS keeps track of both app IDs and names of the apps in each of the three categories.

More info on Qlik help pages.

App names are tricky

Qlik Sense provides app health info based on app IDs.
I.e. we get a list of what app IDs are active, loaded and in-memory.

That’s all good, except those IDs don’t tell us humans a lot.
We need app names.

We can get those app names by simply asking Sense for a list of app app IDs and names, but there are a couple of challenges here:

  1. This query to Sense may not be complex or expensive if done once, but if we do it several times per minute it will create an unwanted load on the Sense server(s).
  2. There is the risk of an app being deleted between we get health data for it and the following appID-name query. Most apps are reasonably long-lived, but the risk is still there.

Right or wrong, Butler SOS takes this approach to the above:

  1. App name queries are done on a separate schedule. In the main config file you can configure both if app names should even be looked up, and how often that lookup should happen. This puts you in control of how up-to-date app names you need in your Grafana dashboards.
  2. The risk of a temporary mismatch between app IDs and app names in Butler SOS remains, and will even get worse if you query app names infrequently. Still, this is typically a very small problem (if even a problem at all).

3.3 - Errors & Warnings

Warnings lead to errors.
Errors lead to unhappy users.
Unhappy users call you.

Wouldn’t you rather get the warnings yourself, before anyone else?

Sense log events

Sense creates a large number of log files for the various parts of the larger Qlik Sense Enterprise environment.
Those logs track more or less everything - from extensions being installed, users logging in, to incorrect structure of Active Directory user directories.

Butler SOS monitors select parts of these logs and provides a way to get an aggregated, close to real-time view into warnings and errors.

The log events page has more info on this.

Grafana dashboards

Butler SOS stores the log events in InfluxDB together with all other metrics.
It is therefore possible to create Grafana dashboards that combine both operational metrics (apps, sessions, server RAM/CPU etc) with warning and error related charts and tables.

Showing warnings and errors visually is often an effective way to quickly identify developing or recurring issues.

Example 1: Circular Active Directory references

Here is an example where bursts of warnings tell us visually that something is not quite right.
50-60 warnings arriving in bursts every few minutes - something is not right.
There is also a single error during this time period - what’s going on there?

We can then look at the actual warnings (=log events) to see what’s going on.

In this case it turns out to be circular references in the Active Directory data used in Qlik Sense. Not a problem with Sense per se, but still something worth looking into:

Warnings and errors from Qlik Sense in Grafana dashboard

Warnings from Qlik Sense in Grafana table

Looking at that single error then, it turns out it’s caused by a task already running when it was scheduled to start:

Errors from Qlik Sense in Grafana table

Example 2: Session bursts causing proxy overload

This example is would have been hard or impossible to investigate without Butler SOS.

The problem was that the Sense environment from time to time became very slow or even unresponsive. A restart solved the problem.

Grafana was set up to send alerts when session count increase very rapidly. Thanks to this actions could be taken within minutes from the incident occuring.

Looking in the Grafana dashboard could then show something like below.
New sessions started every few seconds, all coming from a single client. The effect for this Sense user was that Sense was unresponsive.

The log events section of the Grafana dashboard showed that this user had been given 5 proxy sessions, after which further access to Sense internal resources (engine etc) was denied.

Sessions increasing very quickly

In this case the problem was caused by a mashup using iframes that when recovering from session timeouts caused race conditions, trying to re-establish a session to Sense over and over again.

Alerts

Both New Relic and Grafana includes a set of built-in alert features that can be used to set up alerts as well as forward them to destinations such as Slack, MS Teams, email, Discord, Kafka, Webhooks, PagerDuty and others.

More info about Grafana alerts here.
New Relic alerts described here.

4 - Examples

See Butler SOS in action!

The whole SenseOps (Butler family of tools, together with support tools like InfluxDB, Prometheus, Grafana etc) stack is really a platform that can be configured and used in lots of different ways.

Be curious and bold - extend the given examples and use cases with your own! And when doing so - please consider sharing your successes (and failures..) with others, as inspiration and insight.

Tip

You are strongly recommended to use the latest version of Grafana with the latest version of Butler SOS.

That said, earlier Butler SOS versions included some nice demo dashboards too, these are still found below as a source for inspiration what can be done.

4.1 - Visualising Butler SOS metrics in New Relic

New Relic is a complete SaaS product that offers both data storage and powerful, yet easy to set up and use visualisations.

In this dashboard most of the data comes from Butler SOS, but the failed reload pie chart and table at the top are created using data from the Butler tool.

New Relic does not give you nearly the same level control as Grafana does, for example when it comes to fine-tuning visual details of charts, tables etc.

That can feel a bit limiting at first, but as New Relic’s query language is very powerful and is very well integrated in the chart editor, it’s not really a problem.
A benefit of New Relic’s dashboards is that their dashboard editor very effectively guide you through the creation of charts, using the various kinds of data (metrics, events, logs) you have sent - and are sending - to their database.

New Relic dashboards are thus definitely on the same level as Grafana ditto, which one to use is a matter of preference.

New Relic dashboard with data from both Butler SOS and Butler (part 1).

New Relic dashboard with data from both Butler SOS and Butler (part 2).

Viewing failed reload logs from within New Relic can save lots of time when investigating incidents.

4.2 - Qlik Sense monitoring using Butler SOS v9.2 and Grafana v9.1

Grafana 9 adds some interesting features around alerting as well as serveral new and improved chart types.

A sample Grafana 9 dashboard is available in the Github repository.

The visual appearance is quite similar to previous versions, but line charts, tables etc have been replaced with the updated variants that arrived with Grafana 9.

4.3 - Qlik Sense monitoring using Butler SOS v7 and Grafana v8

With version 8 Grafana further establishes its position as the leadning open source platform for obervability and real-time dashboards.
Butler SOS takes advantage of this, below a sample dashboard is shown.

Tip

The screen shots below are taken from the Grafana 8 demo dashboard that’s included in the Butler SOS repository.

Feel free to modify it to your specific needs.

A concept that has proven useful many times is to use an overview dashboard to monitors high-level metrics for the entire Sense cluster. A separate, parameterised dashboard then drill into the details for each server.
Grafana variables make this both easy to set up, scalable and very powerful.

Sample dashboards are available in the Git repository.
Before importing these to Grafana you should create a Grafana data source called “senseops”, and point it to your InfluxDB database. When you then import the dashboards they should find your database straight away.

Dashboard installation

The dashboard file senseops_v7_0_dashboard.json was created using Butler SOS 7.0 and Grafana 8. It thus usees the new chart, data transformation and alerting features that were introduced in Grafana 8.

Overview metrics

The dashboard has a top section that’s always expanded.
A set of (by default) collapsed sections contain different kinds of metrics and log events.

Grafana dashboard Top level metrics

Low memory alerts can be set (using Grafana’s alert feature). Such alerts can be sent (using features built into Grafana) as notifications to Slack, Teams, Pager Duty, as email etc. To keep the dashboard nice and clean it’s usually a good idea to put alert charts in their own section at the bottom, or in a separate dashboard dedicated to alerts.

Apps in memory

From a sysadmin perspective it’s often interesting to know what apps are loaded into memory on each Sense server.
For example, when a server is quickly loosing RAM it’s extremely useful to be able to zoom in to the very minute when the RAM drop occurs, then look at what apps were present in memory. One of those apps is probably not well designed, or is at least using a lot of memory.

The dashboard separates regular apps and session apps.
You can also use Grafana’s standard filtering features to narrow down on the server(s) of interest.

Grafana dashboard Apps loaded into memory

Users & sessions per server

If things really go wrong wrong in a Qlik Sense Enterprise environment there connected users might be kicked out. It is therefore important to know at any given time how many users are connected, and be able to detect sudden drops in user count.

Another use case could be for maintenance windows: You then want to know how many - and which - users are connected, so you can send them a message that maintenance is about to start.

Grafana dashboard Users and sessions per server

User events

If Butler SOS has been configured to handle user level events coming from Sense, these are shown here.

You get information about where the event took place and which user has

  • logged in (started a session)
  • logged out (stopped a session)
  • timed out (stopped a session)
  • opened a connection to a new app
  • closed a connection to an app (for example closed a browser tab)
  • … and more

Grafana dashboard User events

Warnings & Errors

In previous Butler SOS demo dashboards this information came from log db.
Starting with Butler SOS 7 the source of this data is instead the log events introduced in version 7.

Some of this information is also available in the standard Operations Monitor app in Qlik Sense Enterprise, but only in a retrospective way.
Having access to it in close to real time makes it possible to act on developing issues quicker.

Charts provide overview while tables then give the actual messages, as they appear in the log files.

Grafana dashboard Error and warning charts

Grafana dashboard Error and warning tables

It’s also possible to drill down into individual warnings and errors to get very detailed information about what happened:

Grafana dashboard Detailed view into errors and warnings

Butler SOS metrics

Butler SOS is very robust indeed, but it may still be of interest to track its memory use, to make sure there aren’t any memory leaks etc.

Grafana dashboard Butler SOS memory usage

4.4 - Qlik Sense monitoring using Grafana 7

Grafana 7 is a big update when it comes to visualisations. Grafana was excellent already in version 6, but with v7 things are taken to a new level.

The screen shots below are taken from the Grafana 7 sample dashboard included in the Butler SOS repository.

Feel free to modify it to your specific needs.

A concept that has proven useful many times is to use an overview dashboard to monitors high-level metrics for the entire Sense cluster. A separate, parameterised dashboard then drill into the details for each server.

Sample dashboards are available in the Git repository. Before importing these to Grafana you should create a Grafana data source called “SenseOps”, and point it to your InfluxDB database. When you then import the dashboards they should find your database straight away.

Overview metrics

This view gives high level insights into the 3 virtual proxies (/sales, /sourcing, /finance) in this particlar Sense environment, as well as top-level numbers on users and sessions.

Grafana dashboard Top level metrics

Low memory alerts can be set (using Grafana’s alert feature). Such alerts can be sent (using features built into Grafana) as notifications to Slack, Teams, Pager Duty, as email etc. To keep the dashboard nice and clean it’s usually a good idea to put alert charts in their own section (see below for an example).

Apps in memory

From a sysadmin perspective it’s often interesting to know what apps are loaded into memory on each Sense server.

Here you get the details broken down by regular apps and session apps.
You can also use Grafana’s standard filtering features to narrow down on the server(s) of interest.

Grafana dashboard Apps loaded into memory

Users & sessions per server

If things really go wrong wrong in a Qlik Sense Enterprise environment there connected users might be kicked out. It is therefore important to know at any given time how many users are connected, and be able to detect sudden drops in user count.

Another use case could be for maintenance windows: You then want to know how many - and which - users are connected, so you can send them a message that maintenance is about to start.

Grafana dashboard Users and sessions per server

Warnings & Errors

This information is available in the standard Operations Monitor app in Qlik Sense Enterprise, but only in a retrospective way.
Having access to it in close to real time makes it possible to act on developing issues quicker.

Charts provide overview while tables then give the actual messages, as they appear in the log files.

Grafana dashboard Error and warning charts

Grafana dashboard Error and warning tables

Butler SOS metrics

Butler SOS is very robust indeed, but it may still be of interest to track its memory use, to make sure there aren’t any memory leaks etc.

Grafana dashboard Butler SOS memory usage

Alerts

While it’s perfectly possible to include alerts in almost any Grafana chart, sometimes its nice to tuck the alert-enabled charts away, out of sight. They will do their job and alert when needed.

Grafana dashboard Alerts

4.5 - Qlik Sense monitoring using Grafana 6

Probably the most obvious and common use case for Butler SOS. View Qlik Sense and Windows operational metrics in great looking Grafana dashboards.

Grafana is an increadibly capable tool for showing time series data.

The dashboards shown here are thus just examples and inspiration - feel free to extend and adapt these to meet your particular needs. There are also plenty of sample Grafana dashboards out there to get inspiration from.

If you experience issues with the Grafana dashboards included in the Butler SOS release on Github, you might want to try upgrading to a later/latest Grafana version.

This is a top use case for Butler SOS.
These kind of dashboards give you detailed insights into several important metrics for your Sense servers:

  • CPU load
  • Amount of free RAM memory
  • Number of sessions in total
  • Success rate of the Qlik engine’s cache
  • Number of loaded apps in the Qlik engine

A concept that has proven useful many times is to use an overview dashboard to monitors high-level metrics for the entire Sense cluster. A separate, parameterised dashboard then drill into the details for each server.

Sample dashboards are available in the Git repository. Before importing these to Grafana you should create a Grafana data source called “SenseOps”, and point it to your InfluxDB database. When you then import the dashboards they should find your database straight away.

Overview dashboard

An overview dashboard could look something like this:

Grafana dashboard RAM usage for each server

Low memory alerts can be set (using Grafana’s alert feature). Such alerts can be sent (using features built into Grafana) as notifications to Slack, Teams, Pager Duty, as email etc.


Grafana dashboard General/high level user sessions info for both whole system and each server


You can also get very detailed sessions metrics, down to the level of individual sessions per server and virtual proxy.

One possible use case for this information is to see what users will be affected by a pending server reboot. You could even use this information to send a chat message to these users, informing them that their connection to Sense will be lost in x minutes. This feature is not available in Butler SOS out of the box, but is quite possible to implement if needed.

As of Butler SOS v5.0.0, detailed user session metrics stored in a fairly comprehensive way in InfluxDB. The visualisation of these metrics is still kind of rough though. The charts below can serve as inspiration, but can surely be improved upon..

Grafana dashboard Detailed user sessions info per server and virtual proxy


When something breaks in a Qlik Sense environment the logs immediately fill up with warning and/or error messages. By keeping track of these it’s easy to quickly spot (and get notified) issues when they first occur:

Grafana dashboard Detailed user sessions info per server and virtual proxy

Note how there are lots of INFO level messages generated (note the y axis scales in the diagram above!). In a production setting it’s usually a good idea to turn off extraction of INFO level log messages into InfluxDB.

This is controlled in the YAML config file.

5 - Reference docs

Technical reference docs and general sources of information that might be useful.

References in this site

Config file format

A detailed description of all configuration options in the main config file is available here.

Available metrics

A complete list of all metrics provided by Butler SOS is found here.

Useuful references out there

Qlik Sense APIs

Qlik’s API documentation is found here.

InfluxDB

InfluxDB docs here.

Note that Butler SOS was developed with InfluxDB 1.x in mind. It is not currently compatible with InfluxDB 2.x.

Prometheus

Prometheus docs here.

Grafana

Grafana docs here.

MQTT

There are various MQTT brokers available, both commercial and open source.
Mosquitto is an open sourceMQTT broker with a solid track record and available as a Docker container.

There are also plenty of companies offering SaaS MQTT brokers, ranging from small specialised companies to the big cloud providers.

5.1 - Command line options

Description of Butler SOS’ command line options.

Command line options

When starting Butler SOS, you can pass command line options to customize its behavior.
Looks like this:

Usage: butler-sos [options]

Butler SenseOps Stats ("Butler-SOS") is a microservice publishing operational Qlik Sense metrics to InfluxDB, Prometheus and New Relic.
User events and log events can be forwarded from Sense to Butler SOS and then acted upon there. Events can be stored in InfluxDB and sent to New Relic.
Add Grafana for great looking dashboards and you get real-time monitoring of what happens inside a Qlik Sense environment.

Options:
  -V, --version                        output the version number
  -c, --configfile <file>              path to config file
  -l, --loglevel <level>               log level (choices: "error", "warn", "info", "verbose", "debug", "silly")
  --new-relic-account-name  <name...>  New Relic account name. Used within Butler SOS to differentiate between different target New Relic accounts
  --new-relic-api-key <key...>         insert API key to use with New Relic
  --new-relic-account-id <id...>       New Relic account ID
  --skip-config-verification           Disable config file verification (default: false)
  -h, --help                           display help for command

-V, –version

Output the version number of Butler SOS.

-c, –configfile

Specifies the configuration file to use.

Valid values: A path to a configuration file.

Default: Whatever is specified in the NODE_ENV environment variable, with a .yaml extension added. Butler SOS will look for that file in the ./config directory.

Example:

  • -c or --configfile are not specified. NODE_ENV is set to production. Butler SOS will try to read settings from ./config/production.yaml.

-l, –loglevel

Specifies the log level to use.
When set, this overrides the log level specified in the configuration file.

Valid values: ’error’, ‘warn’, ‘info’, ‘verbose’, ‘debug’, ‘silly’

Default: ‘info’

When using New Relic as backend for storing metrics, you can specify New Relic credentials in the config file - but that is not ideal from a security perspective.

To avoid that, you can specify the New Relic credentials on the command line using the following options.

–new-relic-account-name

List of New Relic account names. Used within Butler SOS to differentiate between different target New Relic accounts to which data can be sent. This name has nothing to do with the account name used in New Relic - it’s purely for Butler SOS’ internal use.
Specifically, it’s at multiple places in the config file where you can specificy to which New Relic account to send data.

Enclose account names in quotes if they contain spaces.
Separate multiple account names with a space.

Example: --new-relic-account-name "Account 1" "Account 2"

–new-relic-api-key

List of New Relic API keys. Used to authenticate with New Relic.

Enclose API keys in quotes if they contain spaces.
Separate multiple API keys with a space. Note that the order of the API keys must match the order of the account names, i.e. the first API key corresponds to the first account name, the second API key corresponds to the second account name, and so on.

Example: --new-relic-api-key "API key 1" "API key 2"

–new-relic-account-id

List of New Relic account IDs. Used to identify the New Relic account to which data should be sent.

Enclose account IDs in quotes if they contain spaces.
Separate multiple account IDs with a space. Note that the order of the account IDs must match the order of the account names, i.e. the first account ID corresponds to the first account name, the second account ID corresponds to the second account name, and so on.

–skip-config-verification

Disable config file verification.

By default, Butler SOS verifies the config file when it starts. If the config file is invalid, Butler SOS will log an error and exit.
Use this option to disable config file verification.

-h, –help

Display help for command.

5.2 - Config file format

Everything you ever wanted to know about the Butler SOS configuration file.

Tip

The config file uses YAML notation, with file extensions of .yaml or .yml.
The .yaml extension is recommended.

The config file is the heart of Butler SOS.
All setting must be defined in the config file - run time errors are likely to occur otherwise.

The sample config file looks like this:

Sample config file

A few things to keep in mind:

  • Topic names (e.g. “Butler-SOS.logdb”) are case sensitive.
  • First time Butler SOS is started, a new check is done if the specified InfluxDB database already exists. If it doesn’t exist it will be created together with a default InfluxDB retention policy. The retention policy is based on the time period set in the config file.

Top level

Parameter Description
logLevel The level of details in the logs. Possible values are silly, debug, verbose, info, warn, error (in order of decreasing level of detail).
fileLogging true/false to enable/disable logging to disk file
logDirectory Subdirectory where log files are stored
anonTelemetry Can Butler SOS share anonymous data about itself with the Butler SOS project? More info on whata data is collected here.

Butler-SOS.heartbeat

Heartbeats can be used to send “I’m alive” messages to some other tool, e.g. an infrastructure monitoring tool.
The concept is simple: The remoteURL will be called at the specified frequency. The receiving tool will then know that Butler SOS is alive.

Parameter Description
enable Should heartbeats be sent to some URL, indicating that Butler SOS is alive and well? true/false
remoteURL URL that will be called for heartbeats
frequency How often should heartbeats be sent? Format according to https://bunkat.github.io/later/parsers.html#text

Butler-SOS.dockerHealthCheck

Docker health checks are used when running Butler SOS as a Docker container.

The Docker engine will call the container’s health check REST endpoint with a set interval to determine whether the container is alive/well or not.
If you are not running Butler SOS in Docker you can disable this feature.

Parameter Description
enable Should a Docker healthcheck endpoint be created within Butler SOS? Set to false if not running Butler SOS under Docker. true/false
port Port the healthcheck should use. Usually 12398, but might need be changed if seveal Butler instances run on the same server

Butler-SOS.uptimeMonitor

Parameter Description
enable Should messages with Butler SOS uptime and memory usage be written to console and logs? true/false
frequency How often should uptime messages be written to console and/or logs? Format according to https://bunkat.github.io/later/parsers.html#text
logLevel Starting at what log level should uptime messages be used? Possible values are silly, debug, verbose, info, warn, error. For example, if you specify “verbose” here, uptime messages will appear if you set overall log level to silly, debug or verbose.
storeInInfluxdb.
butlerSOSMemoryUsage
Should data on Butler SOS’ own memory use be stored in Infludb? true/false
storeInInfluxdb.
instanceTag
Tag used to differentiate data from multiple Butler SOS instances. Useful if running different Butler SOS instances against (for example) DEV, TEST and PROD environments
storeNewRelic.
enable
Should uptime data be sent to New Relic? true/false
storeNewRelic.
destinationAccount
Array of New Relic account names to which uptime data will be sent
storeNewRelic.
metric.dynamic.
butlerMemoryUsage.enable
Should Butler SOS memory metrics be sent to New Relic? true/false
storeNewRelic.
metric.dynamic.
butlerUptime.enable
Should Butler uptime (days, hours, minutes since startup) be sent to New Relic? true/false
storeNewRelic.
attribute.static
Array of attributes which will be added to all uptime metrics sent to New Relic
storeNewRelic.
attribute.dynamic.
butlerVersion.enable
Should uptime metrics be tagged with Butler SOS version number? true/false

Butler-SOS.thirdPartyToolsCredentials

Parameter Description
newRelic Array of credentials for the New Relic accounts to which data should be sent. Each array item consists of severail items, see below.
newRelic[]
accountName
Name of New Relic account. This is a “friendly name” that’s used within Butler SOS to identify each NR account.
newRelic[]
insertApiKey
Insert API key associated with the NR account. Get this from the NR account’s settings page.
newRelic[]
accountId
New Relic account id. Get this from the NR account’s settings page.

Butler-SOS.userEvents

Track individual users opening/closing apps and starting/stopping sessions.
Requires log appender XML file(s) to be added to Sense server(s).

Parameter Description
enable Should Butler SOS track detailed user events (i.e. session start/stop, connection open/close)? true/false
excludeUser Array of users (=directory/userId pairs) that should be disregarded when user events arrive from Sense. Remove sample users before deploying Butler SOS.
udpServerConfig.
serverHost
IP/host where the user event UDP server should listen for incoming connections. Usually the same IP/host as where Butler SOS is running. Using 0.0.0.0 will cause Butler SOS to listen on all available IPs.
udpServerConfig.
portUserActivityEvents
Port on which the user event UDP server will listen. Should match the port specified in the log appender.
tags Array of tags (tagName/tagValue pairs) that should be added to each user event before sending it to InfluxDB. Remove sample tags before deploying Butler SOS.
sendToMQTT.enable Should user events be sent to MQTT? true/false
sendToMQTT.postTo.
everythingTopic.enable
Should all user event messages be sent to an MQTT topic? true/false
sendToMQTT.postTo.
everythingTopic.topic
MQTT topic to which all user event messages will be sent.
sendToMQTT.postTo.
sessionStartTopic.enable
Should session start user event messages be sent to an MQTT topic? true/false
sendToMQTT.postTo.
sessionStartTopic.topic
MQTT topic to which session start user event messages will be sent.
sendToMQTT.postTo.
sessionStopTopic.enable
Should session stop user event messages be sent to an MQTT topic? true/false
sendToMQTT.postTo.
sessionStopTopic.topic
MQTT topic to which session stop user event messages will be sent.
sendToMQTT.postTo.
connectionOpenTopic.enable
Should connection open user event messages be sent to an MQTT topic? true/false
sendToMQTT.postTo.
connectionOpenTopic.topic
MQTT topic to which connection open user event messages will be sent.
sendToMQTT.postTo.
connectionCloseTopic.enable
Should connection close user event messages be sent to an MQTT topic? true/false
sendToMQTT.postTo.
connectionCloseTopic.topic
MQTT topic to which connection close user event messages will be sent.
sendToInfluxdb.enable Should user events be saved in InfluxDB? true/false
sendToNewRelic.enable Should user events be saved in New Relic? true/false
sendToNewRelic.destinationAccount Array of New Relic account names to which user events will be sent.
sendToNewRelic.scramble Should user directory and user ID fields be scrambled before user events are sent to New Relic? true/false

Butler-SOS.logEvents

Log events are used to capture Sense warnings, errors and fatals in real time. Requires log appender XML file(s) to be added to Sense server(s).

Note that log events can be enabled/disabled per source (repository, proxy, scheduler etc).

Parameter Description
udpServerConfig.
serverHost
IP/host where the log event UDP server should listen for incoming connections. Usually the same IP/host as where Butler SOS is running. Using 0.0.0.0 will cause Butler SOS to listen on all available IPs.
udpServerConfig.
portLogEvents
Port on which the log event UDP server will listen. Should match the port specified in the log appender.
tags Array of tags (tagName/tagValue pairs) that should be added to each log event before sending it to InfluxDB. Remove sample tags before deploying Butler SOS.
source.
engine.enable
Should log events from the engine service be handled by Butler SOS? true/false
source.
proxy.enable
Should log events from the proxy service be handled by Butler SOS? true/false
source.
repository.enable
Should log events from the repository service be handled by Butler SOS? true/false
source.
scheduler.enable
Should log events from the scheduler service be handled by Butler SOS? true/false
sendToMQTT.enable Should log events be sent to MQTT? true/false
sendToMQTT.baseTopic Root MQTT topic. All log events MQTT messages will be posted in this topic or subtopics of it.
sendToMQTT.postTo
.baseTopic
Should all log events be posted to the root topic? true/false
sendToMQTT.postTo
.subsystemTopics
All log events originate from a specific subsystem in a Sense server. These subsystems are organised in a hierarchical tree that can be directly mapped to MQTT topics. Should log events be posted as MQTT messages to such topics? true/false
sendToInfluxdb.enable Should log events be saved in InfluxDB? true/false
sendToNewRelic.enable Should log events be sent to New Relic? true/false
sendToNewRelic.destinationAccount Array of New Relic account names to which log events will be sent.
sendToNewRelic.
source.engine.enable
Should log events from the engine service be handled?
sendToNewRelic.
source.engine.logLevel.error
Should ERROR log events from the engine service be handled?
sendToNewRelic.
source.engine.logLevel.warn
Should WARN log events from the engine service be handled?
sendToNewRelic.
source.proxy.enable
Should log events from the proxy service be handled?
sendToNewRelic.
source.proxy.logLevel.error
Should ERROR log events from the proxy service be handled
sendToNewRelic.
source.proxy.logLevel.warn
Should WARN log events from the proxy service be handled
sendToNewRelic.
source.repository.enable
Should log events from the repository service be handled?
sendToNewRelic.
source.repository.logLevel.error
Should ERROR log events from the repository service be handled
sendToNewRelic.
source.repository.logLevel.warn
Should WARN log events from the repository service be handled
sendToNewRelic.
source.scheduler.enable
Should log events from the scheduler service be handled?
sendToNewRelic.
source.scheduler.logLevel.error
Should ERROR log events from the scheduler service be handled
sendToNewRelic.
source.scheduler.logLevel.warn
Should WARN log events from the scheduler service be handled

Butler-SOS.logdb

As of August 2021 log db has been deprecated in Qlik Sense.
It is no longer installed when doing fresh QSEoW installs.

To support older QSEoW clusters out there Butler SOS will for now keep log db support intact.

Parameter Description
enable Should Sense log db be queried for warnings/errors/info messages? true/false
pollingInterval How often to query log db. Milliseconds
queryPeriod How far back should log db be queried? Human readable, e.g. “5 minutes” (which is also the default value)
host IP or FQDN of server where Sense log db is running
port Port used by log db. 4432 unless changed during installation of Sense
qlogsReaderUser User to connect to log db as. “qlogs_reader” unless changed during installation of Sense
qlogsReaderPwd Password of above user
extractErrors Should error entries be extracted from log db? true/false
extractWarnings Should warning entries be extracted from log db? true/false
extractInfo Should info entries be extracted from log db? true/false.
NOTE: If info level logging is enabled, this will result in lots of messages being stored in Influxdb (at least for a busy Sense cluster).

Butler-SOS.cert

Certificates to use when connecting to Sense. Get these from the Certificate Export in QMC.

Parameter Description
clientCert Certificate file. Exported from QMC
clientCertKey Certificate key file. Exported from QMC
clientCertCA Root certificate for above certificate files. Exported from QMC
clientCertPassphrase Password used to protect the certificate (as set when exporting cert from QMC)

Butler-SOS.mqttConfig

MQTT config parameters. These must be correctly defined for any other MQTT features in Butler SOS to work.

Parameter Description
enable Should health metrics be sent to MQTT? true/false
brokerHost IP or FQDN of MQTT broker
brokerPort Broker port
baseTopic Default topic used if not not oherwise specified elsewhere. Should end with /. For example butler-sos/

Butler-SOS.newRelic

If enabled, select Butler SOS metrics and events will be sent to New Relic.

Note that New Relic destination accounts for events are defined in the Butler-SOS.userEvent and Butler-SOS.logEvent sections, whereas destination accounts for metrics are defined in this section (Butler-SOS.newRelic).

Parameter Description
enable Should Qlik Sense health metrics be sent to New Relic? true/false
event.url Which API URL should be used for sending events to New Relic?
At time of this writing the options are
https://insights-collector.eu01.nr-data.net
https://insights-collector.newrelic.com
More info here: https://docs.newrelic.com/docs/accounts/accounts-billing/account-setup/choose-your-data-center
event.header Array of name/value pairs that will be added as http headers to all calls to the New Relic event API
event.attribute.
static
Array of name/value pairs, representing attributes/tags that will be added to all events sent to New Relic
event.attribute.
dynamic.butlerSosVersion.
enable
Should Butler SOS’ version be attached as an attribute to events sent to New Relic? true/false
metric.destinationAccount Array of New Relic account names to which Sense health metrics will be sent.
metric.url Which API URL should be used for sending Sense health metrics to New Relic?
At time of this writing the options are
https://insights-collector.eu01.nr-data.net/metric/v1
https://metric-api.newrelic.com/metric/v1
metric.header Array of name/value pairs that will be added as http headers to all calls to the New Relic metric API
metric.dynamic.
engine.memory.
enable
Send Sense memory metrics to New Relic? true/false
metric.dynamic.
engine.cpu.
enable
Send Sense CPU metrics to New Relic? true/false
metric.dynamic.
engine.calls.
enable
Send metrics about calls to the Sense engine to New Relic? true/false
metric.dynamic.
engine.selections.
enable
Send metrics about number of selections made in Sense apps to New Relic? true/false
metric.dynamic.
engine.sessions.
enable
Send aggregated Sense engine session metrics to New Relic? true/false
metric.dynamic.
engine.users.
enable
Send aggregated Sense user metrics to New Relic? true/false
metric.dynamic.
engine.saturated.
enable
Send Sense engine saturation status to New Relic? true/false
metric.dynamic.
apps.docCount.
enable
Send metrics on loaded/active/in-memory Sense apps to New Relic? true/false
metric.dynamic.
apps.activeDocs.
enable
Should data on what docs are active in engine be sent to New Relic (true/false)?
metric.dynamic.
apps.loadedDocs.
enable
Should data on what docs are loaded (=having open sessions or connections) in engine be sent to New Relic (true/false)?
metric.dynamic.
apps.inMemoryDocs.
enable
Should data on what docs are in engine memory be sent to New Relic (true/false)?
metric.dynamic.
cache.cache.
enable
Send Sense cache metrics to New Relic? true/false
metric.dynamic.
proxy.sessions.
enable
Send aggregated Sense proxy metrics to New Relic? true/false
metric.attribute.
static
Array of name/value pairs, representing attributes/tags that will be added to all Sense health metrics sent to New Relic
metric.attribute.
dynamic.butlerSosVersion.
enable
Should Butler SOS’ version be attached as an attribute to Sense health metrics sent to New Relic? true/false

Butler-SOS.prometheus

If enabled, select Butler SOS metrics will be exposed on a Prometheus compatible URL from where they can be scraped by Prometheus.

Parameter Description
enable Should health metrics be made available for scraping on a Prometheus compatible API http endpoint? true/false
host IP on which the Prometheus compatible endpoint should be available. Using 0.0.0.0 will cause Butler SOS to listen on all available IPs.
port Port on which the Prometheus compatible endpoint will be made available. Default 9842.

Butler-SOS.influxdbConfig

InfluxDB config parameters. These must be correctly defined for any other InfluxDB features in Butler SOS to work.

Parameter Description
enable Should health metrics be stored in Influxdb? true/false
hostIP IP or FQDN of Influxdb server.
hostPort Port where Influxdb server is listening. Useful if Influxdb for some reason is not using its standard port of 8086.
NOTE: Must be set to a value (for example 8086), otherwise this config entry will be flagged as invalid when the config file format is verified on startup.
auth.enable Enable if data is to be stored in a password protected Influxdb database.
auth.username Influxdb username.
auth.password Influxdb password.
dbName Database namne in Influxdb to which health metrics will be stored. Database will be created if it does not already exist when Butler SOS is started.
retentionPolicy.
name
Name of default retention policy that will be created in InfluxDB database when that database is created during first execution of Butler SOS.
retentionPolicy.
duration
Duration during which metrics are kept in InfluxDB. After the duration has passed, InfluxDB will purge all data older than duration from the database. See InfluxDB docs for details on syntax.
includeFields.
activeDocs
Should a list of currently active Sense apps be stored in Influxdb? true/false
includeFields.
loadedDocs
Should a list of Sense apps opened in a user session be stored in Influxdb? true/false
includeFields.
activeDocs
Should a list of Sense apps loaded into memory (some apps might not currently be associated with a user session) be stored in Influxdb? true/false

Butler-SOS.appNames

Parameter Description
enableAppNameExtract Should app names be extracted from Qlik Sense server? true/false
extractInterval How often (milliseconds) should app names be extracted from Sense server?
hostIP IP or FQDN of Sense server from which app names should be extracted

Butler-SOS.userSessions

Extract user session data per virtual proxy.

Parameter Description
enableSessionExtract Influxdb password
pollingInterval Influxdb password
excludeUser Array of users (=directory/userId pairs) that should be disregarded when user session data arrives from Sense.

Butler-SOS.serversToMonitor

Parameter Description
pollingInterval How often to query the Sense healthcheck API
rejectUnauthorized Set to false to ignore warnings/errors caused by Qlik Sense’s self-signed certificates.
Set to true if the Qlik Sense root CA is available on the computer where Butler SOS is running.
serverTagsDefinition List of tags to add to each server when storing the data in Influxdb. All tags defined here MUST be present in each server’s definition section further down in the config file!
servers List of what servers to monitor. For each server a set of properties MUST be defined.
servers.
host:4747
FQDN of server. Domain should match that of the certificate exported from QMC - otherwise certificate warnings may appear. NOTE: You need to specify the port too - should be :4747 unless it’s been changed from default value (very unusual to change this).
servers.
serverName
Human friendly server name
servers.
serverDescription
Human friendly server description
servers.
logDbHost
Server’s name as it appears in the process_host field log db. This is needed in order to link entries in logdb to the specific server at hand. See note below too!
servers.
userSessions.
enable
Control whether user session data should be retrieved for this server
servers.
userSessions.
host
Host and port from which to retrieve user session data. Usually on the form servername.mydomain.net:4243
servers.
userSessions.
virtualProxies
A list of key-value pairs. Use to specify for which virtual proxies on this server user session data should be retrieved.
serverTags A list of key-value pairs. Use to provide more metadata for servers. Can then (among other things) be used to created more advanced Grafana dashboards.
headers A list of key-value pairs. Headers specified here will be used when retrieving metrics from this Sense server.

The Butler-SOS.serversToMonitor.servers.logDbHost property can be tricky to get right. Easiest way to get the correct value is to look in the Nodes section in the QMC. In the Host name column you find the host names of the various nodes. logDbHost should be set to the first part of each host name:

Log db host name

5.3 - Available Metrics

In order to create graphs in for example Grafana, you must understand what metrics are available and how they are structured.

5.3.1 - Available metrics: InfluxDB

In order to create dashboards in for example Grafana, you must understand what metrics are available and how they are structured.

InfluxDB

Metrics retrieved from the Sense servers can be stored in an InfluxDB database. You don’t have to be an InfluxDB expert to use Butler SOS, but understanding some basic concepts are helpful.

Storing metrics in InfluxDB is not mandatory, but some kind of metrics storage - either in InfluxDB, New Relic or Prometheus - is needed to take full benefit of Butler SOS’ features.

  • InfluxDB is a time series database. This means it is super good at storing values that have a timestamp associated with them - and pretty bad at everything else. In many respects time series databases are the opposite of traditional SQL databases (who are usually pretty bad at handling time series data).

  • Because of it’s focus on time series data, InfluxDB has its own query language, InfluxQL. It is somewhat similar to SQL, but also has many unique commands and features.

  • Browsing through the key concepts of InfluxDB is a good idea. There you will learn about things such as measurements, series and tags - which are all key to using data stored in InfluxDB.

Tip

The list of metrics below shows all metrics that Butler SOS can store in InfluxDB.

If you have disabled some features of Butler SOS, the asociated metrics will not be stored in InfluxDB.

Metrics structure

The metrics are grouped based on what kind of Qlik Sense data they represent. InfluxDB is a very capable database, so we will only touch on the basics here.

Overview

Measurements are just what it sounds like: snapshots of some value(s), taken at a specific point in time. A measurement can contain several field keys, which for practical purposes can be viewed as the individual metrics.

For example, the list of measurements look like this (using the InfluxDB command line client to explore the database structure):

> use senseops
Using database senseops
> show measurements
name: measurements
name
----
apps
butlersos_memory_usage
cache
cpu
log_event
log_event_logdb
mem
saturated
sense_server
session
user_events
user_session_details
user_session_list
user_session_summary
users
>

Let’s take a look at what field keys the apps measurement contains:

> show field keys from apps
name: apps
fieldKey                     fieldType
--------                     ---------
active_docs                  string
active_docs_count            integer
active_docs_names            string
active_session_docs_names    string
calls                        integer
in_memory_docs               string
in_memory_docs_count         integer
in_memory_docs_names         string
in_memory_session_docs_names string
loaded_docs                  string
loaded_docs_count            integer
loaded_docs_names            string
loaded_session_docs_names    string
selections                   integer
>

Ok, so the field keys are the actual metrics for which we gather data. Collectively those metrics (again: field keys in InfluxDB lingo) above are grouped into a measurement called apps.

There is one more concept you need to understand: tag keys

It’s pretty simple: Tag keys are used to categorise (or simply “tag”) measurements.
Let’s say you use Butler SOS to collect data from ten Sense servers. That’s great, but how will you later distinguish between server 3 and server 8? You need some way of telling your Grafana dashboard to show the data for server 3 (if that’s what you want).

Tags solve this. In the Butler SOS YAML config file you can define any number of tags that will be used to tag data coming in from Qlik Sense.

The beauty of tags is that they play very nicely with Grafana - without them the Grafana dashboards would not be nearly as flexible as they are.

To see what tag keys a certain measurement has you use a query similar to the one above/for fields:

> show tag keys from apps
name: apps
tagKey
------
host
serverBrand
serverLocation
server_description
server_group
server_name
server_type

Note that this list of tags consits of

  1. Tags always present. These are inserted by Butler SOS and are present for all measurements. These are host, server_description and server_name.
  2. Tags configured in Butler SOS’ config fil. In the example above these are serverBrand, serverLocation, server_group andserver_type.

Measurements and fields

The measurements are grouped based on what part of Sense they are retrieved from. The groups are

  1. General health metrics.
  2. Messages from the log database.
  3. Detailed metrics about what users are connected to (i.e. have sessions open with) which virtual proxies.
  4. Messages from the log database.
  5. Log events: Warning, error and fatal messages from QSEoW logs.
  6. User events: Session and connection related messages from QSEoW logs.
  7. Metric relating to Butler SOS itself (i.e. not retrieved from Sense).

General health metrics

A shared set of tag keys are available for all general health metrics:

Tag key Description
host Host name, taken from config file’s Butler-SOS.serversToMonitor.servers[].host property. Usually a fully qualified host name, or in some cases an IP address.
server_name Human readible/friendly server name, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverName property.
server_description Description of the server, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverDescription property.

In addition to the above, all tags defined in the YAML config file for the servers will be included as tag keys.

Measurement: apps

Source: Health check API

Field key Type Description
active_docs string An array of GUIDs of active apps. Empty if no apps are active. An app is active when a user is currently performing some action on it.
active_docs_count integer Number of currently active apps
active_docs_names string Names of currently active (non-session) apps
active_session_docs_names string Names of currently active session apps
in_memory_docs string An array ofthe GUIDs of all apps currently loaded into the memory, even if they do not have any open sessions or connections to it. The apps disappear from the list when the engine has purged them out from memory.
in_memory_docs_count integer Numer of apps currently in memory
in_memory_docs_names string Names of (non-session) apps currently in memory
in_memory_session_docs_names string Names of session apps currently in memory
loaded_docs string An array of the GUIDs of apps currently loaded into memory and that have open sessions or connections. Empty if no apps are loaded.
loaded_docs_count integer Number of currently loaded apps
loaded_docs_names string Names of currently loaded (non-session) apps
loaded_session_docs_names string Names of currently loaded session apps
calls integer Number of calls to the Qlik associative engine since it started
selections integer Numer of selections made in Qlik associative engine since it started
Measurement: cache

Source: Health check API

Field key Type Description
added integer Number of cache objects added to the cache
bytes_added integer Number of bytes added to the cache
hits integer Number of cache hits in engine
lookups integer Number of lookups in egnine
replaced integer Number of cache objects replaced
Measurement: cpu

Source: Health check API

Field key Type Description
total integer Percentage of the CPU used by the engine, averaged over a time period of 30 seconds.
Measurement: mem

Source: Health check API

Field key Type Description
allocated integer The total amount of allocated memory (committed + reserved) from the operating system in MB.
committed integer The total amount of committed memory for the engine process in MB.
free integer The total amount of free memory (minimum of free virtual and physical memory) in MB.
Measurement: saturated

Source: Health check API

Field key Type Description
saturated boolean When the value is true, the engine is running with high resource usage; otherwise the value is false. See link above for details.
Measurement: sense_server

Source: Health check API

Field key Type Description
started string ISO timestamp when the engine service was started.
uptime string Time since engine service was started (human readable).
version string Engine version.
Measurement: session

Source: Health check API

Field key Type Description
active integer Number of active engine sessions. A session is active when a user is currently performing some action on an app, for example, making selections or creating content.
total integer Total number of engine sessions.
Measurement: users

Source: Health check API

Field key Type Description
active integer Number of users currently doing something in some app.
total integer Number of users with established sessions to the Sense server.

User session details

User session metrics have slightly different tag keys depending on the granularity level of the metric - those metrics are therefore listed under each heading below.

Measurement: user_session_summary

Source: Session module API

Field key Type Description
session_count float Total number of sessions, per server and virtual proxy.
session_user_id_list string List of user IDs with sessions, per server and virtual proxy. NOTE: A single user may have more than one session open to a particular server/virtual proxy.

Tag keys:

Tag key Description
host Host name, taken from config file’s Butler-SOS.serversToMonitor.servers[].host property. Usually a fully qualified host name, or in some cases an IP address.
server_name Human readible/friednly server name, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverName property.
server_description Description of the server, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverDescription property.
user_session_host Host name the session metrics are associated with.
user_session_virtual_proxy Virtual proxy name the session metrics are associated with.
Measurement: user_session_list

Source: Session module API

Field key Type Description
session_user_id_list string List of user IDs with sessions, per server and virtual proxy. NOTE: A single user may have more than one session open to a particular server/virtual proxy.

Tag keys:

Tag key Description
host Host name, taken from config file’s Butler-SOS.serversToMonitor.servers[].host property. Usually a fully qualified host name, or in some cases an IP address.
server_name Human readible/friednly server name, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverName property.
server_description Description of the server, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverDescription property.
user_session_host Host name the session metrics are associated with.
user_session_virtual_proxy Virtual proxy name the session metrics are associated with.
Measurement: user_session_details

Source: Session module API

Field key Type Description
session_id string Session GUID, uniquely identifying the session in the entire Sense cluster.
user_directory string Session user’s user directory.
user_id string Session user ID

Tag keys:

Tag key Description
host Host name, taken from config file’s Butler-SOS.serversToMonitor.servers[].host property. Usually a fully qualified host name, or in some cases an IP address.
server_name Human readible/friednly server name, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverName property.
server_description Description of the server, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverDescription property.
user_session_host Host name the session metrics are associated with.
user_session_virtual_proxy Virtual proxy name the session metrics are associated with.
user_session_id Session GUID
user_session_user_directory User’s user directory
user_session_user_id User ID

User events

User events capture real-time events in Qlik Sense as they happen.
They originate from Sense’s log4net logging framework and are forwarded from Sense to Butler SOS by means of XML log appenders in Sense.
These events are also forwarded as MQTT messages, allowing other systems to act when warnings/errors/fatals occur in Qlik Sense.

Setup instructions here.

The following user events are handled by Butler SOS:

  • Session start
  • Session stop
  • Connection open
  • Connection close.
Measurement: user_events

Tag keys present for all user_events records:

Tag key Description
event_action Indicates what the event is about. Examples: Start session, Stop session, Open connection, Close connection.
host Host name as reported in Qlik Sense’s proxy log files.
origin Textual description of what caused the event. Can for example be AppAccess, which means a user opened or closed a browser tab with a Sense app in it.
userDirectory Sense user directory of the user causing the event.
userId Sense user ID for the user causing the event.
userFull The combination of userDirectory and userId.

If the user event includes browser user agent information, the following tags will be present:

Tag key Description
uaBrowserName Name of connecting user’s browser.
uaBrowserMajorVersion Connecting user’s browser version.
uaOsName Connecting user’s operating system.
uaOsVersion Connecting user’s operating system version.

In addition to the above tags defined in the Butler SOS config file will be added.
More info here.

Fields:

Field key Description
appId Id of app that is opened/closed.
appName Name of app that is opened/closed.
userFull Same as the userFull tag.
userId Same as the userId tag.

Log events

Log events are used to capture warning, error and fatal messages in Sense. Once in Butler SOS these events are stored in InfluxDB (enabling Grafana dashboards).
These events are also forwarded as MQTT messages, allowing other systems to act when warnings/errors/fatals occur in Qlik Sense.

Setup instructions here.

Info

There is only one measurement for log events. It’s simply called log_event.

Different QSEoW services (Qlik Sense Enterprise on Windows) will send different tags and metrics in the log events.
Each variant is described below.

This modular approach to log events makes it possible to extend Butler SOS’ with additional log events if/when needed..

Source: Proxy service

Events such as failed login attempts will be sent from the proxy service.

Proxy log events have these tags:

Tag key Description
host Host name as reported in Qlik Sense’s log files.
level Sense log level. Possible values are WARN, ERROR, FATAL.
log_row Row number in Sense log file where the event can be found. Useful if you after all have to dig into the log files.
result_code Result code as reported by the Sense soure system that caused the event. Its meaning will differ depending on where the event originated.
source Source system within Sense that caused the event. Examples: qseow-scheduler, qseow-proxy, qseow-repository
subsystem Subsystem where the event originated. More granular than source. Example: System.Scheduler.Scheduler.Master.Task.TaskSession
user_directory Sense user directory of the user causing the event. Example: MYCOMPANY
user_id Sense user ID for the user causing the event. Example: joe
user_full The combination of user_directory and user_id. Example: MYCOMPANY\joe

Fields in proxy log events:

Field key Description
command Description of what caused the event, as found in the Sense logs. Example: Login:TryLogin
context In what context (if one exists) the event occured. If no context is available Not available will be used.
exception_message If a serious problem/exception occurs the associated message is available here.
message Description of what the event is about. Example: Login failed for user 'LAB\\goran' wrong credentials?
origin Example: qseow-repository.
raw_event The raw event message as received from QSEoW. Described here.
result_code Example: 500

The raw_event is the actual log event message sent from QSEoW to Butler SOS.
It has the following components:

Part of message Description
command Description of what caused the event, as found in the Sense logs. Example: Login:TryLogin
context In what context (if one exists) the event occured. If no context is available Not available will be used.
exception_message If a serious problem/exception occurs the associated message is available here.
host Host name as reported in Qlik Sense’s log files.
level Sense log level. Possible values are WARN, ERROR, FATAL.
log_row Row number in Sense log file where the event can be found. Useful if you after all have to dig into the log files.
message Description of what the event is about. Example: Login failed for user 'LAB\\goran' wrong credentials?
origin Party of the proxy service the event originated from. Rarely used by Sense.
result_code Result code as reported by the Sense soure system that caused the event. Its meaning will differ depending on where the event originated. Example: 500
source Source system within Sense that caused the event. Examples: qseow-scheduler, qseow-proxy, qseow-repository
subsystem Subsystem where the event originated. More granular than source. Example: System.Scheduler.Scheduler.Master.Task.TaskSession
tags User defined tags. Set in the main YAML config file. Example: {"env":"DEV","foo":"bar"}
ts_iso Timestamp (ISO format) when the event occured, according to QSEoW. Example: 20211126T214006.122+0100
ts_local Event timestamp (time format of Sense server). Example: 2021-11-26 21:40:06,122
user_directory Sense user directory of the user causing the event. Example: MYCOMPANY
user_full The combination of user_directory and user_id. Example: MYCOMPANY\joe
user_id Sense user ID for the user causing the event. Example: joe
windows_user Windows account used to run the proxy QSEoW Windows service. Example: LAB\\qlikservice
Source: Scheduler service

Events such as failed reload tasks will be sent from the scheduler service.

Scheduler log events have these tags:

Tag key Description
host Host name as reported in Qlik Sense’s log files.
level Sense log level. Possible values are WARN, ERROR, FATAL.
log_row Row number in Sense log file where the event can be found. Useful if you after all have to dig into the log files.
source Source system within Sense that caused the event. Examples: qseow-scheduler, qseow-proxy, qseow-repository
subsystem Subsystem where the event originated. More granular than source. Example: System.Scheduler.Scheduler.Master.Task.TaskSession
user_directory Sense user directory of the user causing the event. Example: MYCOMPANY
user_id Sense user ID for the user causing the event. Example: joe
user_full The combination of user_directory and user_id. Example: MYCOMPANY\joe
task_id Tasik ID (if a task is involved in the event, for example task failing). Example: 58dd8322-e39c-4b71-b74e-13c47a2f6dd4
task_name Task name (if a task is involved in the event). Example: Reload task of Meetup.com

Fields in scheduler log events:

Field key Description
app_id Application ID (if an app is involved in the event). Example: deba4bcf-47e4-472e-97b2-4fe8d6498e11
app_name Application name (if an app is involved in the event). Example: Meetup.com
exception_message If a serious problem/exception occurs the associated message is available here.
execution_id ID identifying a particular task execution. Example: 67a56c3b-2e20-4df8-ad1b-e48de28e1bfa
message Description of what the event is about. Example: Login failed for user 'LAB\\goran' wrong credentials?
raw_event The raw event message as received from QSEoW. Described here.

The raw_event is the actual log event message sent from QSEoW to Butler SOS.
It has the following components:

Part of message Description
app_id Application ID (if an app is involved in the event). Example: deba4bcf-47e4-472e-97b2-4fe8d6498e11
app_name Application name (if an app is involved in the event). Example: Meetup.com
exception_message If a serious problem/exception occurs the associated message is available here.
execution_id ID identifying a particular task execution. Example: 67a56c3b-2e20-4df8-ad1b-e48de28e1bfa
host Host name as reported in Qlik Sense’s log files.
level Sense log level. Possible values are WARN, ERROR, FATAL.
log_row Row number in Sense log file where the event can be found. Useful if you after all have to dig into the log files.
message Description of what the event is about. Example: Login failed for user 'LAB\\goran' wrong credentials?
source Source system within Sense that caused the event. Example: qseow-scheduler
subsystem Subsystem where the event originated. More granular than source. Example: System.Scheduler.Scheduler.Slave.Tasks.ReloadTask
tags User defined tags. Set in the main YAML config file. Example: {"env":"DEV","foo":"bar"}
task_id Tasik ID (if a task is involved in the event, for example task failing). Example: 58dd8322-e39c-4b71-b74e-13c47a2f6dd4
task_name Task name (if a task is involved in the event). Example: Reload task of Meetup.com
ts_iso Timestamp (ISO format) when the event occured, according to QSEoW. Example: 20211126T214006.122+0100
ts_local Event timestamp (time format of Sense server). Example: 2021-11-26 21:40:06,122
user_directory Sense user directory of the user causing the event. Example: MYCOMPANY
user_full The combination of user_directory and user_id. Example: MYCOMPANY\joe
user_id Sense user ID for the user causing the event. Example: joe
windows_user Windows account used to run the proxy QSEoW Windows service. Example: LAB\\qlikservice
Source: Repository service

The repository service is the hub around which the rest of Qlik Sense revolves.
As such it emit events in many different situations. One example can be when a Sense node is offline (thais example is used in the field description below).

Repository log events have these tags:

Tag key Description
host Host name as reported in Qlik Sense’s log files.
level Sense log level. Possible values are WARN, ERROR, FATAL.
log_row Row number in Sense log file where the event can be found. Useful if you after all have to dig into the log files.
source Source system within Sense that caused the event. Examples: qseow-scheduler, qseow-proxy, qseow-repository
subsystem Subsystem where the event originated. More granular than source. Example: System.Scheduler.Scheduler.Master.Task.TaskSession
result_code Result code as reported by the Sense soure system that caused the event. Its meaning will differ depending on where the event originated.
user_directory Sense user directory of the user causing the event. Example: MYCOMPANY
user_id Sense user ID for the user causing the event. Example: joe
user_full The combination of user_directory and user_id. Example: MYCOMPANY\joe

Fields in scheduler log events:

Field key Description
command Description of what caused the event, as found in the Sense logs. Example: Login:TryLogin
context In what context (if one exists) the event occured. If no context is available Not available will be used.
exception_message If a serious problem/exception occurs the associated message is available here.
message Description of what the event is about. Example: Login failed for user 'LAB\\goran' wrong credentials?
origin Example: qseow-repository.
raw_event The raw event message as received from QSEoW. Described here.
result_code Example: 500

The raw_event is the actual log event message sent from QSEoW to Butler SOS.
It has the following components:

Part of message Description
command Description of what caused the event, as found in the Sense logs. Example: Check service status
context In what context (if one exists) the event occured. If no context is available Not available will be used. Example: /qps/servicestatusworker
exception_message If a serious problem/exception occurs the associated message is available here.
host Host name of event source, as reported in Qlik Sense’s log files. Example: pro2-win1
level Sense log level. Possible values are WARN, ERROR, FATAL.
log_row Row number in Sense log file where the event can be found. Useful if you after all have to dig into the log files. Example: 7296
message Description of what the event is about. Example: Method: 'SendRimQrsStatusRequest'. Failed to retrieve service status from 'http://pro2-win3.lab.ptarmiganlabs.net:4444/status/'. Server host 'pro2-win3.lab.ptarmiganlabs.net'. Error message: 'Unable to connect to the remote server'
origin Party of the proxy service the event originated from. Rarely used by Sense.
result_code Result code as reported by the Sense soure system that caused the event. Its meaning will differ depending on where the event originated. Example: 500
source Source system within Sense that caused the event. Example: qseow-repository
subsystem Subsystem where the event originated. More granular than source. Example: Service.Repository.Repository.Core.Status.ServiceStatusWorker
tags User defined tags. Set in the main YAML config file. Example: {"env":"DEV","foo":"bar"}
ts_iso Timestamp (ISO format) when the event occured, according to QSEoW. Example: 20211128T201538.508+0100
ts_local Event timestamp (time format of Sense server). Example: 2021-11-28 20:15:38,508
user_directory Sense user directory of the user causing the event. Example: MYCOMPANY
user_full The combination of user_directory and user_id. Example: MYCOMPANY\joe
user_id Sense user ID for the user causing the event. Example: joe
windows_user Windows account used to run the proxy QSEoW Windows service. Example: LAB\\qlikservice

Messages from the log database

All log data written to InfluxDB share a common set of tag keys:

Tag key Description
host Host name, taken from config file’s Butler-SOS.serversToMonitor.servers[].host property. Usually a fully qualified host name, or in some cases an IP address.
server_name Human readible/friednly server name, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverName property.
server_description Description of the server, taken from config file’s Butler-SOS.serversToMonitor.servers[].serverDescription property.
log_level The logging level of the log event (ERROR, WARNING, INFO etc).
source_process Which Sense service the log event originated in.
Measurement: log_event_logdb

Source: More or less log db. A query is done to the log db in Postgres, the results are stored in InfluxDB. There is thus no Qlik API call per se.

Field key Type Description
message string Log entry as retrieved from the Sense log database (Postgres).

Butler SOS metrics

Measurement: butlersos_memory_usage

These metrics tell you how much memory Butler SOS itself uses.
More info on these metrics and what they mean is available here.

Field key Type Description
heap_total float Total size of the allocated heap.
heap_used float Actual memory used during the execution of Butler SOS.
process_memory float Total memory allocated for the execution of Butler SOS.

5.3.2 - Available Metrics: New Relic

Once data has been sent to New Relic, its web based user interface makes it very intuitive to both create charts and combine these into dashboards.

New Relic

New Relic offers a complete SaaS observablity stack, ranging from high-volume ingestion of events/metrics/logs/traces to advanced dashboards that can be created ad-hoc using a web UI or from files and templates, for more of an infrastructure-as-code approach.

Storing metrics in New Relic is not mandatory, but some kind of metrics storage - either in New Relic, InfluxDB or Prometheus - is needed to take full benefit of Butler SOS’ features.

In order to view data in New Relic you first have to send data to them.

Butler SOS does this for you.

Furthermore, you can to a large degree control which Qlik Sense metrics, logs and events are sent to New Relic.

Data volumes and pricing

At the time of this writing New Relic offers a generous free plan.
It will be a great starting point for everyone, if there’s a need for more dashboard users etc the account can be upgraded as needed.

In most cases Butler SOS will not generate a lot of data and you can stay within New Relic’s free tier.
The amount of data generated by Sense health metrics and Butler SOS uptime metrics is very small indeed, but if your Qlik Sense environment for some reason generate a lot of log events that can cause the data volumes to increase rapidly.

For example, if a user connects to Sense and gets a https certificate warning in the browser, this will also cause a number of warnings and errors in the proxy logs. Multiple this by X users and there can suddenly be thousands of errors and warnings per hour in the Sense logs.
If these are also sent to New Relic the data volumes increase quickly.

Overview of New Relic

New Relic is similar to InfluxDB in that Butler SOS pushes data to both systems.

The basic concepts are

  • Metrics represent a measurement of some kind. Number or sessions in the Sense proxy, amount of free RAM on a Sense server etc.
  • Events are something that happened. Warnings and errors in the Sense log files can be forwarded to New Relic as events.
    Various user activities (user session start/stop etc) in Sense can also be sent to New Relic as events.
  • Attributes are conceptually tags that are attached to metrics or events. These act as dimensions for the data. Metrics in visualisations can be grouped by attributes, much in the same way Qlik Sense measurements are grouped by dimensions in Sense charts and tables.
    • Static attributes are defined in Butler SOS’ config file.
    • Dynamic attributes are determined at runtime.

In addition to the above these data formats exist but are not currently used by Butler SOS. This may change in the future.

  • Logs are essentially regular lines in a log file, consisting of several fields.
  • Distributed tracing collects data as requests travel from one service to another, recording each segment of the journey as a span. These spans contain important details about each segment of the request and are eventually combined into one trace. The completed trace gives you a picture of the entire request.

5.3.3 - Available Metrics: Prometheus

In order to create graphs in for example Grafana, you must understand what metrics are available and how they are structured.

Prometheus

Metrics retrieved from the Sense servers can be stored in Prometheus. You don’t have to be a Prometheus expert to use Butler SOS, but understanding some basic concepts are helpful.

Storing metrics in Prometheus is not mandatory, but some kind of metrics storage - either in Prometheus, InfluxDB or New Relic - is needed to take full benefit of Butler SOS’ features.

Prometheus gathers metrics by “scraping” data from web pages (“endpoints”) on which metrics are displayed in a well specified format.
Most metrics from the Sense servers are exposed on a Prometheus compatible endpoint, but not all.
InfluxDB is more flexible for some types of data, while Prometheus provides more easily used features for data aggregation when data should be displaued in Grafana.

Prometheus endpoint

Prometheus is enabled/disabled in the Butler-SOS.prometheus section in the config file. Prometheus metrics are available on the /metrics URL on the IP and port specified in the config file.

For example, if the host is 0.0.0.0 and the port is 9842, Butler SOS will listen on port 9842 on all available network interfaces. If the Butler SOS’ server’s IP address is 192.168.1.168, a call from a web browser can look like this:

Prometheus metrics in web browser

This is the web page Prometheus will scrape and ingest into it’s time-series database.

Overview of Prometheus

In contrast to InfluxDB, to which Butler SOS pushes data, Prometheus works the other way around.
The Prometheus server is responsible for gathering data exposed by the systems that should be monitored (for example Butler SOS).

The basic concepts are

  • Metrics represent the measurements of interest. “fields” in InfluxDB.
  • Labels are used to categorize metrics (similar to tags in InfluxDB).

Labels

The labels available for all Prometheus metrics are:

Label name Source Description
host Butler-SOS.serversToMonitor.servers[].host Host IP or FQDN of the server from which the metric comes.
server_name Butler-SOS.serversToMonitor.servers[].serverName Human friendly server name.
server_description Butler-SOS.serversToMonitor.servers[].serverDescription Human friendly server description.
Butler-SOS.serversToMonitor.servers[].serverTags.* All tags defined in the config file will be added as Prometheus labels.

Metrics

Available metrics are similar to those in InfluxDB, with a few exceptions.

Prometheus is awesome when it comes to storing all kinds of measurements, but it doesn’t offer a good way to store strings.
For that reason Butler SOS metrics involving strings (for example list of apps loaded in memory) are not available on the Prometheus endpoint.
Most of the metrics come from Qlik Sense’ health check API.

Qlik Sense metrics

These are the Prometheus metrics exposed by Butler SOS:

Metric Type Description
butlersos_apps_calls Gauge Total number of requests made to the Qlik Sense engine.
butlersos_apps_selections Gauge Total number of selections made to the Qlik Sense engine.
butlersos_apps_activedocs_total Gauge Number of active apps. An app is active when a user is currently performing some action on it.
butlersos_apps_inmemorydocs_total Gauge Number of apps apps currently loaded into memory, even if they do not have any open sessions or connections to it. Apps disappear from this metric when the engine has purged them from memory.
butlersos_apps_loadeddocs_total Gauge Number of apps apps currently loaded into memory, that also have open sessions or connections.
butlersos_cache_added Gauge Number of cache objects added.
butlersos_cache_hits Gauge Number of cache hits.
butlersos_cache_lookups Gauge Number of cache lookups.
butlersos_cache_replaced Gauge Number of cache replaced cache objects.
butlersos_cache_saturated Gauge When the value is 1, the engine is running with high resource usage; otherwise the value is 0.
butlersos_cpu_total Gauge Percentage of the CPU used by the engine, averaged over a time period of 30 seconds.
butlersos_mem_committed Gauge The total amount of committed memory for the engine process in MB.
butlersos_mem_allocated Gauge The total amount of allocated memory (committed + reserved) from the operating system in MB.
butlersos_mem_free Gauge The total amount of free memory (minimum of free virtual and physical memory) in MB.
butlersos_session_active Gauge Number of active engine sessions. A session is active when a user is currently performing some action on an app, for example, making selections or creating content.
butlersos_session_total Gauge Total number of engine sessions.
butlersos_users_active Gauge Number of distinct active users. An active user is one who is currently performing an action on an app.
butlersos_users_total Gauge Total number of distinct users within the current engine sessions.
butlersos_engine_metadata Gauge Metadata about the Qlik Sense engine.
butlersos_user_session_total Gauge Number of sessions (as reported by the proxy service).

Node.js metrics

A set of Node.js specific metrics are also available on Butler SOS’ Prometheus endpoint.
These are described in the “Default metrics” section on this page.

6 - Security considerations

Security is important! Keep these things in mind when deploying Butler.

Security is a whole field in its ownn.
What’s considered secure in one company might be considered insecure in another.

It’s a good idea to review your tools and services every now and then, for example making sure they are updated to include the latest secutity patches etc.
When in doubt, be paranoid.

Security considerations

  • Make sure to configure the firewall on the server where Buter SOS is running to only accept connections from the desired clients/IP addresses.

    A reasonable first approach would be to configure the firewall to only allow calls from localhost. That way calls to/from Butler SOS can only be made from the server where Butler SOS itself is running.
    If Butler SOS is not running on the Sense server, the IPs of the Sense servers should also be whitelisted in the firewall, of course.

  • The MQTT connections are not secured by certificates or passwords.

    For use within a controlled network that might be fine, but nonetheless something to keep in mind. Adding certificate based authentication (which MQTT supports) would solve this.

  • Butler SOS uses various Node.js modules from npm. If concerned about security, you should review these dependencies and decide whether there are issues in them or not.

    Same thing with Butler SOS itself - while efforts have been made to make Butler SOS secure, you need to decide for yourself whether the level of security is enough for your use case.

    Butler SOS is continuously checked for security vulnerabilities by using GitHub security audit, Snyk, npm audit and other tools.

Butler SOS talking to Qlik Sense

Butler SOS uses https for all communication with Sense, using Sense’s certificates for authentication.

7 - Legal stuff

Information om licenses, personal integrity and more.

License

Butler SOS is distributed under the MIT license.

MIT License

Copyright (c) 2023 Göran Sander

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Data collection

This documentation site was built using Hugo and consists of static web pages hosted on GitHub Pages.

GitHub probably keep log files for GitHub Pages, your visit to this documentation site is thus likely to be recorded there. For questions about such logs you should contact GitHub.

Google Analytics

Plausible is used to track general site usage. The collected data is not shared with any third parties.

Aggregated data may be used to determine, and in some cases publicly communicate, how popular the site is.
Examples of aggregated metrics are how many users have visited the site and from where in the world they accessed the site.

Telemetry data

If enabled in the config file, Butler SOS will send anonymous telemetry data to PostHog, with data stored within the EU.
A detailed description of what’s included in the telemetry datais found here.

Ptarmigan Labs collects this information to better understand

  1. what the execution environment looks like for Butler SOS out there, and
  2. which Butler SOS features are used/not used

The purpose of the telemetry data is ultimately to aid in future development of Butler SOS, focusing on the most used features and execution environments.

If you want your historical telemetry data to be deleted, you must provide Ptarmigan Labs with the system ID mentioned in the logs when Butler SOS is starting.
This ID is what identifies your specific instance of Butler SOS.

Until you tell Ptarmigan Labs what your ID is there is no link between you and that ID.
The idea is simply that the anonymous telemetry should be just that: anonymous.