iWelcome takes extensive measures in ensuring Business Continuity for its IDaaS platform in an operational context.
However, there is always a worst-case scenario where the (connectivity to) service becomes unavailable due to a technical failure, an external or environmental calamity (a Disaster).
iWelcome and its datacentre provider, GTT, have set up several services to mitigate the impact of unavailability and data loss in Disaster situations. iWelcome’s standard IDaaS service configuration includes:
- High Availability (HA)- using a redundant deployment within a single datacentre.
- Disaster Recovery (DR)- using a back-up & restore procedure to failover to a second datacentre.
Optionally iWelcome offers a DR configuration that reduces recovery time and data loss by using a semi-automated procedure to failover to a 2nd datacentre in another geographical location. This service is outlined in this document and referred to as DR+.
In the event of a Disaster, iWelcome’s DR+ service provides the capability to recover from unavailability and to minimise data loss. The DR+ service includes a set of policies and procedures as well as technical configuration and set-up to enable quick recovery of iWelcome’s vital infrastructure and systems.
With DR+ the service disruption in the event of a disaster is minimised by a semi-automated failover from the primary datacentre to the secondary datacentre in a different geographical location.
The DR+ service is designed to restore the availability of the iWelcome IDaaS service with minimum downtime.
Two advantages of DR+ over the back-up restore DR procedure are:
- The disaster recovery tenant is fully up & running, resulting in a lower recovery time (RTO).
- The data is synchronised in short intervals between the primary datacentre and secondary datacentre, resulting in a shorter data recovery point (RPO).
DR+ is characterised by the following:
- The secondary datacentre contains the same configuration, setup and data (with a certain delay) as in the primary datacentre (RPO).
- Upon decision to switch, the iWelcome IDaaS service fails over from the primary datacentre to the secondary datacentre.
- The failover mechanism relies on scheduled data replication (using snapshots) between the primary and the secondary datacentre.
- Active SSO sessions during the switch over between the primary and the secondary datacentre in case of Disaster Recovery will be lost. Users have to login again and will then have SSO to their applications.
DR+ is depicted in the below high level figure.
The following diagram gives a schematic overview of the main DR+ events and activities.
The Recovery Time Objective (RTO) is the time between the decision to switch (t1) to the secondary datacentre and the Production Availability (t2) of the iWelcome IDaaS service in the secondary datacentre.
The RTO does not include the time between the occurrence of the Disaster (t0) and the decision to switch to the second datacentre (t1). Production Availability means that the iWelcome IDaaS service is ready to handle authentication and provisioning requests (CRUD) on the basis of the last successful replication.
Note: As customer is responsible for the DNS switch and configuration of a low TTL (preferably less than 10minutes), the service may be effectively usable by end users after iWelcome has met the RTO objective!
The Recovery Point Objective (RPO) is the maximum age of the data available on the iWelcome Tenant in the secondary datacentre at the moment of the Disaster (t0). Data younger than this moment, between (t-1) and (t0), is deemed lost. The replicated data in the secondary datacentre is at a minimum 6 hours old and at a maximum 14 hours old.
In case of the occurrence of a disaster, the following process applies:
a. Customer Service Manager and iWelcome Service Manager follow the incident management process as described in the OLA for severity level 1 incidents.
b. Customer Service Manager and iWelcome Service Manager agree on the need to failover the service from the primary datacentre to the secondary datacentre (T1). Failover is only executed upon mutual agreement.
2. DR+ execution. iWelcome executes the actual failover.
a. iWelcome activates the service in the secondary datacentre and performs readiness check.
b. iWelcome Service Manager indicates the (restored) service is available to the customer.
3. Access to the service.
To make the iWelcome IDaaS service from the secondary datacentre accessible to end users the following steps are executed (this can be simultaneously with step 2):
a.Customer applies DNS changes as described in the OLA.
b.Customer and iWelcome verify the DNS change(s).
c.Customer accepts the delivery.
4. Debriefing between the Service Managers from the Customer and iWelcome.
5. DR+ process is closed.
The OLA describes the DR+ operations processes and communication flows.
The DR+ service is dependent on the following customer procedures and activities:
A defined and agreed incident management process at the customer side, including escalation process and alignment with the OLA.
Availability of authorised and informed staff at management level, necessary for the decision making on the failover.
Availability of network engineering capacity to change and test customer DNS configuration.
A working VPN connection with the secondary datacentre configured upfront, alike the one to the primary datacentre.
Firewall settings configured upfront to also support the secondary datacentre in case of failover.
DNS preconfigured with low TTL times (preferably DNS TTL is set to 10 min).
|#||Item||DR - Back-up and Restore||DR+|
|2||# of active datacentres||1 + 1 back-up||1 primary + 1 secondary + 1 back-up|
|3||Availability Commitment||99,5% to 99,9% (HA)||99,90%|
|4||Layout||Single or Double Stack (HA)||Quadruple Stack|
|5||Architecture ( * provisioning Active - Active Q4 2016)||Active - Active (HA) *||Active - Active * (1) | Passive - Passive (2)|
|6||Session Failover||Yes within primary DC (HA)|| Yes, within primary DC
Yes, within secondary DC after DR
|7||DR Configuration||Back-up only||(Delayed) Replication to Passive-Passive in 2 + back-up|
|8||Back-Up||Back-up in primary on local servers & back-up server + replicated in Frankfurt on copy back-up server||Back-up in primary + secondary on local servers & back-up server + replicated in Frankfurt on copy back-up server|
|9||Connectivity||Connectivity is setup during DR; manual DNS change by customer required||Connectivity has been established; manual DNS change by customer is required|
|10||Recovery Time Objective (RTO) = Time to restore iWelcome IDaaS service after decision to initiate DR||Maximum 24 - 48 hours||Maximum 1 hour|
|11||Recovery Point Objective (RPO) = Maximum age of the data measured back in time from the moment the disaster started||Maximum 30 hours|| Minimum 6 hours
Maximum 14 hour
|12||Requires new login of user||Yes||Yes|
|13||Fail-back||Required|| Not required
Secondary datacentre becomes primary