How to configure high availability

Squared Up can be made into a highly available (HA) pair by hosting two separate instances and having them read data files from a shared location (such as a network share).

This style of deployment is typically used to

  • Enable load-balancing between servers hosting the same content
  • Create two different access points into Squared Up each with different modes of authentication (e.g. one instance with Windows authentication and the other with forms)
Unlike earlier versions of Squared Up, you cannot use mklink to achieve this setup. Installations of Squared Up 3.0 and above will behave erratically if they are configured to share data in this manner.

Requirements

Configuration

For fresh installations

Primary

  1. Install and Activate Squared Up using the Primary activation key

  2. After activation and setup is complete, open Windows File Explorer on the server and navigate to the on-disk folder containing Squared Up (typically C:\inetpub\wwwroot\squaredupv3)

  3. From here, as an administrator, open a command prompt in the current location (this can be done via File > Open command prompt > Open command prompt as administrator, or Shift+Right click and Open command prompt here if you are already an administrator)

  4. From the command prompt run the following command to reconfigure Squared Up for HA:

    squaredup ha --share=<Your shared folder here>

    where <Your shared folder here> should be replaced by a drive or path specification for your network share. e.g:

    • X:\
    • \\myhost\folder (UNC path)

      The folder/share must already exist: Squared Up cannot create it automatically (for example, specifying \\myhost\folder is invalid if folder is not already shared by myhost).

      Running the command should generate output similar to the following:

      Path \\example\share confirmed to exist
      "C:\Windows\system32/inetsrv/appcmd.exe" stop apppool /apppool.name:SquaredUpv3
      "SquaredUpv3" successfully stopped
      
      Successfully stopped application pool SquaredUpv3
      Creating \\example\share\User\Configuration...
      Attempting copy of configurations from C:\inetpub\wwwroot\squaredupv3\User\Configuration to \\example\share\User\Configuration
      Skipping copy of authentication.json...
      Copying connections.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of connections.json...
      Skipping copy of highavailability.json...
      Copying oms.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of oms.json...
      Copying open-access-mapping.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of open-access-mapping.json...
      Copying openaccess.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of openaccess.json...
      Copying profiler.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of profiler.json...
      Copying scom.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of scom.json...
      Copying vada.json to \\example\share\User\Configuration...
      Deleting unwanted local copy of vada.json...
      Writing HA configuration file...
      Done
      "C:\Windows\system32\inetsrv/appcmd.exe" start apppool /apppool.name:SquaredUpv3
      
      "SquaredUpv3" successfully started.
      
      Successfully started application pool SquaredUpv3
      

      Note how this output contains numerous file copy operations: these are the Squared Up configuration files being moved to the central location for sharing with another instance.

  5. Run squaredup permissions to give the Squared Up application pool the correct permissions to your HA share, for example:

    squaredup permissions --destination="<Your shared folder here>" --user="DOMAIN\USER"

    where <Your shared folder here> is replaced by your share, such as \\myhost\folder, DOMAIN is your domain and USER is the Squared Up application pool identity.

    See How to check and modify the application pool identity

  6. Navigate to Squared Up using a web browser either on the server itself, or from a client machine. (Note: The previous command will have automatically recycled Squared Up, so you will need to login again)

  7. After logging in, the server should behave identically to how it did post-installation. There are two ways to confirm that HA is in effect:

    • The Squared Up log (typically C:\inetpub\wwwroot\squaredupv3\transient\log\rolling.log) should contain the text

      [Information] Replication (HA) is enabled

      followed by a list of redirected folder paths

    • The path to which HA has been pointed (e.g. \\myhost\folder) should contain a folder entitled Locking within which should be a heartbeat.json. This heartbeat is an essential component of primary ↔ secondary communication

  8. Once the primary is confirmed to be running in HA mode, the secondary server can be configured

Secondary

  1. Install and Activate Squared Up using the Secondary activation key

  2. Open an administrative command prompt on the server inside the Squared Up folder (typically C:\inetpub\wwwroot\squaredupv3) and run

    squaredup ha --share=<Your shared folder here> --noconfigcopy

    This command behaves identically to how it did on the primary: the only difference is --noconfigcopy. This flag tells the configuration tool to leave the data on the shared folder as it is, and to just ‘point’ the current Squared Up at that folder. This is the correct course of action since we have already seen the tool copy files when configuring the primary server.

    The output from this command should look similar to the following:

     Path \\example\share confirmed to exist
     "C:\Windows\system32/inetsrv/appcmd.exe" stop apppool /apppool.name:SquaredUpv3
     "SquaredUpv3" successfully stopped
    
     Successfully stopped application pool SquaredUpv3
     \\example\share\User\Configuration already exists (OK)
     Writing HA configuration file...
     Done
     "C:\Windows\system32\inetsrv/appcmd.exe" start apppool /apppool.name:SquaredUpv3
    
     "SquaredUpv3" successfully started.
    
     Successfully started application pool SquaredUpv3
    

    Note how no files are copied: the secondary will just ‘point’ at what is already on the network share.

  3. After logging in, the server should behave identically to how it did post-installation, however:

    • The secondary should now be displaying the same dashboards and content as the primary

    • Newly created content on primary or secondary should be visible to both nodes

      Content changes are not synchronized instantaneously. Newly created or published dashboards, for example, will become visible within approximately 1 - 2 seconds
    • The licensing details for the secondary in the right-hand menu ☰ > system > named users should reflect the overall quantity of users that your license was purchased for

    • The log of the secondary server contains several indications of correct operation:

      • [Information] Replication (HA) is enabled

      • [Information] Valid heartbeat from primary server "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY": EnterpriseApplicationMonitoring with 13 maximum users

        (this message contains the primary server’s license key and product edition from which the secondary determines its features)

For existing installations

In order to convert an existing set of Squared Up installations to use high availability mode, it is necessary to manually copy the content on one of the machines to the shared drive or network share. This is because the squaredup ha command (demonstrated above) can only copy configuration files to a shared location, and not any existing content (such as dashboards and perspectives).

Primary

  1. Similar to a fresh installation, open an administrative command prompt on the server inside the Squared Up folder (typically C:\inetpub\wwwroot\squaredupv3) and run the following command to reconfigure Squared Up for HA:

    squaredup ha --share=<Your shared folder here>

    where <Your shared folder here> should be replaced by a drive or path specification for your network share. e.g:

    • X:\
    • \\myhost\folder (UNC path)
  2. Once the command completes, all of the configurations will reside on the network share: However, the content that has already been created on the primary server (such as dashboards) will not. The following directories (and their content) need to be manually moved to the network share:

    • User\Packages to <Share>\User\Packages
    • User\Profiles to <Share>\User\Profiles
    • User\State to <Share>\User\State

      If any of these folders already exist in the destination they can be safely deleted or replaced.

      After moving these paths to the network, each folder can be deleted from the local machine. This removes the confusion of having duplicates of the various pieces of content.

      If there are browser sessions pointing at the primary server, the application pool may start up and lock these paths, in which case the application pool should be manually stopped in IIS
  3. Run squaredup permissions to give the Squared Up application pool the correct permissions to your HA share, for example:

    squaredup permissions --destination="<Your shared folder here>" --user="DOMAIN\USER"

    where <Your shared folder here> is replaced by your share, such as \\myhost\folder, DOMAIN is your domain and USER is the Squared Up application pool identity.

    See How to check and modify the application pool identity

  4. Navigate to Squared Up using a web browser either on the server itself, or from a client machine. (Note: The previous command will have automatically recycled Squared Up, so you will need to login again)

  5. After logging in, the server should behave identically to how it did post-installation. There are two ways to confirm that HA is in effect:

    • The Squared Up log (typically C:\inetpub\wwwroot\squaredupv3\transient\log\rolling.log) should contain the text

      [Information] Replication (HA) is enabled

      followed by a list of redirected folder paths

    • The path to which HA has been pointed (e.g. \\myhost\folder) should contain a folder entitled Locking within which should be a heartbeat.json. This heartbeat is an essential component of primary ↔ secondary communication

  6. Once the primary is confirmed to be running in HA mode, the secondary server can be configured

Secondary

These steps assume that the secondary server has already been activated with a secondary license and is operating in limited/fallback mode. See Licensing for details
  1. Stop the Squared Up application pool (typically SquaredUpv3) in IIS

  2. The content already on the this server (the secondary) needs to be merged together with what now exists on the network share

    • Either all of the content on the secondary is declared expendable and should be deleted (this means deleting User\Packages, User\Profiles and User\State from Squared Up - typically under C:\inetpub\wwwroot\squaredupv3)

      OR

    • You must manually merge the content of the following folders into the same locations on <Share>:

      • User\Packages to <Share>\User\Packages
      • User\Profiles to <Share>\User\Profiles
      • User\State to <Share>\User\State

        Windows Server 2012 and above can perform a copy and replace wherein both source and destinations items are retained: this may be desirable if you want to retain content from both primary and secondary.

  3. Open an administrative command prompt on the server inside the Squared Up folder (typically C:\inetpub\wwwroot\squaredupv3) and run

    squaredup ha --share=<Your shared folder here> --noconfigcopy

  4. After the command completes and upon logging in, the server should behave identically to how it did post-installation, however:

    • The secondary should now be displaying the same dashboards and content as the primary

    • Newly created content on primary or secondary should be visible to both nodes

      Content changes are not synchronized instantaneously. Newly created or published dashboards, for example, will become visible within approximately 1 - 2 seconds
    • The licensing details for the secondary in the right-hand menu ☰ > system > named users should reflect the overall quantity of users that your license was purchased for

    • The log of the secondary server contains several indications of correct operation:

      • [Information] Replication (HA) is enabled

      • [Information] Valid heartbeat from primary server "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY": EnterpriseApplicationMonitoring with 13 maximum users

        (this message contains the primary server’s license key and product edition from which the secondary determines its features)

Procedures

Upgrading

It is critical that the primary and secondary run the exact same version. For this reason, there is a recommended order in which upgrades should be performed.

Steps

  1. Disable the primary server. This can be achieved by simply stopping the application pool.

    Assuming there is a load balancer in place, the secondary will now pick up all incoming requests. (Depending upon your environment, you may need to prefix this step with a manual ‘drain’ and disable on the load balancer itself)

  2. Run the Squared Up installer and upgrade the primary instance

  3. Start the primary up again (typically by attempting to access/connect to it, e.g. browsing to it to view dashboards)

  4. Stop the secondary server as soon as possible after the primary has started back up, to prevent the possibility that version differences cause synchroniziation problems. This should preferably be performed within 5-10 minutes of restarting the primary

    It is highly recommended that you prevent new content from being created (e.g. dashboards) in the brief time window during which the instance versions are mismatched. If your service/redundancy needs are simpler, you may want to allow for a maintenance window in which a very brief total outage of your service (1-2 minutes) occurs. This way, you can avoid any complexities of upgrading by completely stopping both nodes and upgrading them one after the other (primary then secondary).
  5. Run the Squared Up installer and upgrade the secondary instance

  6. Start the secondary up again (again, by attempting to browse content on it)

  7. Both nodes are now upgraded and ready for use. To confirm the secondary node is operating correctly, check the log file for a line similar to the following:

    [Information] Valid heartbeat from primary server "ABCDE-FGHIJ-KLMNO-PQRST-UVWXY": EnterpriseApplicationMonitoring with 13 maximum users

    This message confirms that the secondary is able to read and understand data from the newly upgraded primary.

Licensing

The secondary server license

The license on the secondary server (termed as Secondary activation key, with your purchase of Enterprise Starter Pack or EAM) is entirely dependent on the primary server. This means that in order to use all of the features and to have the full number of named users, it is mandatory that the secondary server is configured for high availability (with the squaredup ha command), as detailed above.

Without high availability enabled

If high availability mode is not enabled and the secondary server never communicates with a primary, it operates in a very limited fallback mode:

  • 5 named users maximum
  • All features unique to EAM - including VADA - are unavailable

Failure/failover mode

When configured for high availability, the secondary is configured to stop working if it ceases to be in contact with a licensed primary server.

However, since high availability is supposed to allow the secondary to operate without the primary (e.g. if it goes down), the time allowances are large enough to cover typical outages and maintenance windows:

  • The secondary checks the primary is present every 30 minutes
  • If the primary is unavailable for over 1 hour, the secondary enters a ‘Warning’ state. (The warning mentioned is recorded to the log: it is not displayed to users in the browser)
  • After 3 days in the ‘warning’ state, the secondary server will deny all incoming requests and an error page will be displayed

All of the checking mentioned above is performed via file data on the network share: No additional network communication is required between the servers.

The timespans specified here may be subject to change in future updates of Squared Up, and this page will be updated accordingly.

Internals

It is not necessary to know or understand the internals of high availability in order to use this feature. We document it here since the details may aid troubleshooting in some situations.

Unlike earlier versions, Squared Up 3.0+ instances are considerably more ‘stateful’ - they retain and cache various pieces of data in-memory, rather than reading them from disk on each access.

For example, the following pieces of data are cached:

  • Dashboard pack content listings (inc. indexing for searches)
  • The content, names and matching rules of perspectives
  • User profiles and licensing lists

This generally helps Squared Up to perform better, but it means that simply using the same directory on disk (e.g. by using mklink, discussed at the opening to this article) is not sufficient to enable a synchronized pair - since there is nothing to invalidate these caches.

Sequence files

In order to invalidate its caches, Squared Up tracks ‘sequence numbers’ of different subsystems. Any time state changes in these systems (on primary or secondary) the sequence number is incremented and saved to disk. If either instance detects a sequence change it was not responsible for, it invalidates its local cache.

  • Watching is performed by polling at a resolution of approx. 2 seconds. Squared Up avoids the overhead of actual read I/O by not opening the file unless the actual write time of it has been changed.
  • Instances create and hold a ‘lock file’ (with the extension .lock) when modifying any other .json file. Checks for existing locks are performed before updating the file. This is used to enforce file access locking in environments where native file locking may not be sufficient/reliable (i.e. network shares)

The primary and secondary instances both watch for changes in the following files (creating them if they do not exist):

  • <Share>\Locking\NamedUserLicensing.json
  • <Share>\Locking\PerspectiveCache.json
  • <Share>\Locking\ProfileCache.json
  • <Share>\Locking\<Extension pack ID>-<File type>.json

.lock and .prelock

In order to detect and wait for an existing lock (.lock) on a file to end, instances do not immediately attempt to create lock files.

Instead, .prelock files are created with random file names. The resulting file is then copied into place to become the actual .lock of a particular file.

This means that sometimes ‘dangling’ .prelock files can be left behind after a failed operation (due to sharing violations). Since operations are typically re-tried, some leftover .prelock files in the <Share>\Locking\ directory is perfectly normal. However, significant quantities of these files indicate a problem with the reliability of locking on the shared directory - and warrant further investigation if troubleshooting an HA issue.

How to configure Windows authentication when Squared Up is installed on load balanced servers

How to check and modify the application pool identity label: How to configure high availability keywords: sync synchronise synchronize high availability pair load balance dashboard share