Active-Active

For Application Services + SQL

Executive Summary

Active–active redundancy ensures high availability by running multiple active instances, each fully capable of serving consumers independently of the others. Consistency and failover are managed by external mechanisms, often through application-level APIs that selectively synchronize only trusted objects and block suspicious changes. This independence limits the impact of a breach—an attack on one instance doesn't spread to the others—and enhances efficiency, as fewer standby resources can protect more primary systems.

Features below are examined in the context of Milestone XProtect VMS.

Architecture in a Milestone VMS environment

The architecture comprises two complete and independent XProtect solution instances, each featuring all XProtect services, SQL, and a separate media stack (recording service + media database). Devices stream to both VMSs simultaneously. Optionally, the VMS instances can be federated.

Failover

Two active VMS instances run in parallel, each capable of delivering all core services—authentication, configuration, video, events, and APIs. Clients can consume all services from a single instance or selectively draw different services/subservices from separate instances if one becomes unavailable. In this model, failover is client-driven, granular to the (sub)service level, and occurs as quickly as the client can redirect its requests.

Some failover types are discussed below.

Authentication Failover

Authentication failover is managed by an external load-balancing gatekeeper, which routes all client requests to the primary and automatically redirects them to the secondary if the primary becomes unavailable.

Stream Failover

Near-instantaneous live and playback failover is achieved because at least two media stacks remain active, with redundancy-aware logic built directly into the Client (Smart Client/PSIM). See Active-Active Media Stack.

HA Events (Subservice Failover)

Each VMS instance has an independent event stream. When both VMS instances function perfectly, event streams are identical. However, in practice, various internal and external causes (camera sources, server loads, and client loads) may cause some events to appear in one stream but not the other. For example, a motion detection event may be raised in one VMS instance but not in the other, because the second instance has lost the stream from the camera. In cases such as these, use of redundancy-aware software that processes events from both streams yields event-level HA.

Consistency

An application-aware service ensures database consistency by using the VMS's API to retrieve known objects from one database and deposit them into another. The method, which we call Selective Object Synchronization, provides precise control over exactly which content, in the application's context, the user wants to keep consistent.

For example, the user may not want to keep Role changes automatically synchronized. They may wish to verify changes manually before committing to the other replicas. The service makes this possible.

Latency

Near-instant synchronization relies on the VMS application's ability to inform external listeners of changes quickly. In XProtect, certain events—such as camera password updates and stream parameter adjustments—are immediately available to listeners; however, many changes do not trigger these events. These updates can only be synchronized through polling, which prevents the process from being truly instant. Nonetheless, even with large systems of up to a thousand cameras, full synchronization takes less than an hour, and multiple runs can be scheduled daily, minimizing the overall impact.

Coverage

It's essential to recognize that achieving complete consistency requires the capability to retrieve every type of object from one database and deposit it into another. This retrieval ability depends on the depth of the API provided by the VMS vendor. With XProtect, the API coverage is comprehensive and nearly complete, although not 100%. For instance, XProtect Access and LPR do not have visibility through the API. Consistency for these must be manually maintained.

Cybersecurity

An active-active setup offers enhanced cybersecurity for two key reasons: data center isolation and cyber-secure consistency management.

Datacenter Isolation

Each data center hosting a full VMS stack operates independently, with no shared clustering or storage with the other center. This isolation blocks lateral movement during an attack and ensures a clean, uncompromised environment is always available for recovery.

Selective Object Synchronization

The mechanism of achieving consistency through an application-aware service provides unique cybersecurity advantages.

Ransomware Encryption: The attack is not propagated because no objects are returned through the API.

Stealth Procedure Injection: Application APIs expose no DDL, so the procedure can’t propagate.

Schema Tampering: API calls fail when attempting to retrieve objects, and corruption is not replicated even when object-level controls are disabled.

Standby Disk Overfill: Because the primary and standby servers are no longer joined at the hip by an ever-growing transaction log, an overflowing disk on the backup becomes a localized maintenance issue rather than a cascading system failure.

Efficiency

Active–active systems can achieve M:N efficiency: N standby VMS instances can cover M simultaneous primary instance failures, with N < M. Although there are no solutions in market that achieve N < M at this time, for XProtect, it is not difficult to envision the deployment of such a solution should the demand arise.

PreviousMilestone Management Server Failover NextMedia Stack High Availability

Last updated 1 month ago