This document would like to be a simple guide to give DR on Azure of a SAP environment. Due to the nature and importance of the workload described, this guide is intended as a high level reference that must be integrated with more specific documentation if interested in realize SAP DR solution on Azure.
This guide is splitted in several parts, to give readers ability to navigate an choose what’s more interested in:
- SAP DR on Azure Part 1 – Introduction
- SAP DR on Azure Part 2 – Possible Solution # 1
- SAP DR on Azure Part 3 – Possible Solution # 2
SAP DR on Azure Part 3 – Possible solution # 2
In the previous post (https://secureinfra.blog/2020/06/11/sap-dr-on-azure-part-2/) We described a possible solution to offer DR on Azure for SAP environment. The main benefits of that solution could be summarized briefly:
- Based on HANA backup for data resiliency
- Be cost effective
- Good enough RTO.
One of the concerns in deploying such solution, is related to the RPO: the actual RPO offered by the solution #1 is dependant on HANA backup frequency. If Customer implement a daily running backup, the longest RPO He gets will be 24 hours. Another concern could be potentially related to the RTO time, depending greatly on the HANA restore on Azure and the warm-up fase of HANA DB (it could take hours, depending on the size of the DB). Sometimes, Customers have more stringent requirements as:
- Due to the nature of data inside the SAP environment, is not acceptable to have great data loss so, the RPO should be within minutes;
- The SAP environment is mission critical and as such, the RTO should be the lowest possible.
In order to address these requirements while retaining some other feature such as the robustness and reliability, it is possible to deploy a solution that is ased on SAP HANA System Replica (Asynchronous) for data resiliency and ASR for other roles, just like the solution # 1.
The DR infrastructure proposed as Solution # 2 has the following main features:
- Based on HANA System Replica (Async) for data resiliency;
- RPO within minutes;
- Lower RTO.
The other solution We are talking about, is similar to the # 1 and is showed in the following picture:
As the previous, it provides for the use of Azure Site Recovery to replicate all VMs with the exclusion of the HANA server. For the latter the proposition is to provision a brand new HAN VM ready to use, ad enable an HANA System Replica (Asynchronous) from the primary HANA on prem to replicate data. Due to the replication method, the HANA machine on Azure must be powered on and the HANA engine running.
The choice of using HANA system replica for HANA DB, is mainly to keep RPO and RTO as low as possible. The main benefits of this adoption are in effect:
- RPO very low (got 3 minutes on projects We did)
- RTO very low as well: the fact that HANA on DR is already up and running, dramatically cut the start of services on DR (could be within an hour).
As a drawback, the solution proposed is more expensive than the #1, due to the HANA machine always powered on: to mitigate this problem, a reseved instance could be the choice (https://azure.microsoft.com/en-us/pricing/reserved-vm-instances/).
The proposed solution uses the following objects:
- An Azure subscription, ideally connected via Express-Route to the on-premises;
- An Azure VNET pre-provisioned in the subscription, with one or more subnets in wich placing the VMs
- i.e. a Subnet defined for the HANA machine (Data Layer);
- i.e. a Subnet defined for the others VMs (App Layer);
- The Azure Site Recovery (ASR) to replicate the data of App Servers, ASCS, Web Dispatcher. Such approach could require the installation of a VM on-prem called Configuration Server, dedicated to replicate VM from physical / VMWare machines to Azure;
- A new VM for the SAP HANA role. On it will be installed the product SAP HANA and in normal condition, this VM is turned on and the engine running. If required (e.g. DR activation or maintenance / test operations), its role can be switched to primary quickly.
During the normal operation condition, the HANA machine on Azure will receive the transaction replicas of the primary HANA on-premises and update the local DB. In normal operation condition, the infrastructure will be similar to what showed in the following picture. On it are visible the copy flows due to ASR replication and HANA System Replica. On it, VMs replicated with ASR are missing because they will be there only after the activation of a recovery plan on ASR. A thing to note about the replicas:
- All data related to ASR replicas can go with Internet connection (the ASR endpoints are https addresses on Internet);
- All data related to HANA System Replica, must go on expressRoute / S2S connection.
If the service should be activated on Azure (e.g. in case of a fault on the on-prem site), some processes must be done to activate them (following instructions described later in the present document). At the end of activation operations, the infrastructure should be similar to the following picture:
As already described in the previous chapter, the main components of the proposed solution are the following:
- An Azure subscription present;
- A VNET within the subscription with one or more subnets. The VNET should satisfy at least the following prerequisites:
- It is reachable from on premises through ExpressRoute;
- It has enough available IP addresses for the SAP environment to host;
- A Network Security Group (NSG) that will allow only the needed traffic;
- Azure site Recovery needed for the VM disks replica from on premises to Azure;
- A Recovery Storage Vault, defined in the subscription, needed to activate the ASR functionality;
- Two Azure Storage Accounts where the VMs disks data will be replicate as storage blob. A standard Storage Account will be used for the log replication, a Premium Storage Account will be used for the consolidated VM disk images (the SAP workload on Azure require premium storage);
- One ore more Azure Availability Sets defined within the subscription: these Availability Sets will need to support the Managed Disks and will be required to group the VMs when activated on Azure; they will allow a specific SLA as described at the following link https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability);
- At least one Proximity Placement Group (PPG): SAP workloads are best performing in Azure if Servers are within a PPG;
- An on-premises VM (ASR configuration Server) that will be used to replicate the VMWare on premises VM or physical Machines to Azure. The installation and configuration procedure for the VM is described at the following link:
- VMs configuration https://docs.microsoft.com/en-us/azure/site-recovery/vmware-azure-deploy-configuration-server
- Preparation steps tutorial for enabling the VMs synch: https://docs.microsoft.com/en-us/azure/site-recovery/vmware-azure-tutorial-prepare-on-premises
- A VM on Azure with the SAP HANA role. It will be used to install the SAP HANA software and during normal lifecycle it will be powered on and the engine running, to get replica of data from on-prem ad update the local DB. The VM must be configured using the SAP best practices:
- It must be an Azure SAP certified VM;
- It must use Premium Storage;
- The Azure disks write accelerator must be used for all the disks used by the HANA log component.
For additional information about the SAP on Azure configuration and architecture, look at the following links:
The proposed solution requires to transfer two set of data from on premises datacenter to Azure Public Cloud:
- VM Replica using Azure Site Recovery (ASR). The replica requires a full synchronization at the startup needed to reach the steady state for the VMs disks on Azure. Once the full replica is completed, a “disks write-only” replica is activated: the data amount transferred from on premises to Azure will depend from the amount of “data-write” on the VMs disks.
- DB HANA Replica from primary to DR VM. The data amount depends on the size of Data and Log Backup generated daily.
In summary, the proposed solution requires an available network bandwith from on premises to Azure: the bandwith for ASR can be allocated on both the ExpressRoute and Internet channels. The usage of the Internet channel for ASR could be supported for the following main reasons:
- The endpoints used for the ASR data replication are public so reachable through the Internet;
- All the connections are encrypted via HTTPS using a TLS 1.2 protocol and digital signature of SHA256;
- The Storage Account authentication doesn’t use any username/password combination, instead it used encryption key mechanism.
The usage of express-route for ASR requires additional considerations:
- The ExpressRoute circuit should supply the required bandwidth;
- If not using a flat data rate, costs related to the replication could occour;
- In the ExpressRoute circuit, in order to enable the traffic routing to the public Azure PaaS services, like the storage account, it is required to enable the “Microsoft” peering from the Azure Portal. It will allow to announce to the Expressroute circuit all the BGP communities relatative to the Azure PaaS services.
The traffic related to the HANA System Replica instead, will travel on the dedicated connection from on-prem to Azure (ideally expressRoute). This traffic is in effect point-to-point from SAP on-prem to SAP on Azure and a possible flow could be as the following:
|Source||Source Port||Dest||Dest Port||Service||Note|
|Primary HANA on-prem||any||HANA DR||30001-30002-30003-30005-30007-30040/TCP||SAP HANA Async SR||Normal conditions Replica|
The following chapter descrives all the tasks and activities that need to be carried on in order to activate the SAP D/R environment on Azure (aka Recovery Plan). Specifically, it must be used when there is a real need to make the SAP Azure service as the primary one (example during a real disaster or if there is a business need to move the on-premises workload to Azure).
DR Recovery Plan
All the operations needed to enable the SAP service on Azure are described in the following table. It is important to highlight the following:
- The orange color tasks are manual operations;
- The green color tasks are automatic;
- The tasks at step 1.a and 1.b could be executed in parallel;
- The tasks in each step are sequential;
The above steps are represented in the following flow chart:
This ends the series of SAP DR on Azure. We hope You find it interesting.