This document would like to be a simple guide to give DR on Azure of a SAP environment. Due to the nature and importance of the workload described, this guide is intended as a high level reference that must be integrated with more specific documentation if interested in realize SAP DR solution on Azure.
This guide is splitted in several parts, to give readers ability to navigate an choose what’s more interested in:
- SAP DR on Azure Part 1 – Introduction
- SAP DR on Azure Part 2 – Possible Solution # 1
- SAP DR on Azure Part 3 – Possible Solution # 2
SAP DR on Azure Part 2 – Possible solution # 1
As described in the introduction post, DR on Azure of SAP environment can be done in different ways depending primarily on Customer’s/Business requirements. The solution describe here has the following characteristics:
- Based on HANA backup for date resiliency
- Is cost effective
- RPO depending from backup frequency
- Good RTO.
One of the possible solution, is showed in the following picture and provides for the use of Azure Site Recovery to replicate all VMs with the exclusion of the HANA server. For the latter the proposition is to provision a brand new HAN VM ready to use, in which restore data from backup as needed.
The solution described use Backup/Restore method for HANA machine and as such, it will:
- Be cost effective: the HANA machine on Azure once created is normally turned off and deactivated, minimizing consumption costs;
- Give an RPO of 24 hours as a maximum, if the frequency of HANA backup is daily;
- Give an RTO dependant on the amount of data inside the backup (in Our experience the RTO is within 4 hours).
The proposed solution uses the following objects:
- An Azure subscription, ideally connected via Express-Route to the on-premises;
- An Azure VNET pre-provisioned in the subscription, with one or more subnets in wich placing the VMs
- i.e. a Subnet defined for the HANA machine (Data Layer);
- i.e. a Subnet defined for the others VMs (App Layer);
- The Azure Site Recovery (ASR) to replicate the data of App Servers, ASCS, Web Dispatcher. Such approach could require the installation of a VM on-prem called Configuration Server, dedicated to replicate VM from physical / VMWare machines to Azure;
- A Storage Account for the backup data copied daily from on-prem;
- Definition of a daily copy job from on-prem data (backup repository HANA) to storage accont on Azure (e.g. Azure File);
- A new VM for the SAP HANA role. On it will be installed the product SAP HANA and in normal condition, this VM is turned off. If required (e.g. DR activation or maintenance / test operations), it can be turned on and, after the DB Restore, can be used as needed. On such VM will be defined 2 mount points to the Azure storage account containing Backup files. The Backup repository on Azure will be based on Azure File and the VM will have permanent mount points to an Azure File that could be named as the following:
- /from_onprembkp, pointing to the Azure File defined on the storage Account in which are daily copied HANA backup file from on-prem. This path will be used as restore source during the restore operations.
The mount of Azure Files on the HANA machine can be done using the CIFS protocol, more specifically using SMB 3.0, that gives robustness and security to the communication, accepting only encrypted communications. This choice is to have the following advantages:
- Use of tools already present even in Linux (i.e. SLES 12SP3 natively supports mount.cifs with SMB 3.x, see link https://azure.microsoft.com/en-us/blog/on-premises-azure-files-access-on-linux-update-and-new-troubleshooter/ );
- In this way the HANA VM has backup repositories already in place to speed-up Restore operations;
- The security is ensured first of all because the SMB 3.x accepts only encrypted connection, then because the data transfer is kept inside the Azure Backbone inside the same region.
During the normal operation condition, the HANA machine on Azure will be off and the mount to the backup paths will be persisted in the fstab. In normal operation condition, the infrastructure will be similar to what showed in the following picture. On it are visible the copy flows due to ASR replication and daily copy of Backup. On it VMs replicated with ASR are missing because they will be there only after the activation of a recovery plan on ASR.
If the service should be activated on Azure (e.g. in case of a fault on the on-prem site), some processes must be done to activate them (following instructions described later in the present document). At the end of activation operations, the infrastructure should be similar to the following picture:
As already described in the previous chapter, the main components of the proposed solution are the following:
- An Azure subscription present;
- A VNET within the subscription with one or more subnets. The VNET should satisfy at least the following prerequisites:
- It is reachable from on premises through ExpressRoute;
- It has enough available IP addresses for the SAP environment to host;
- A Network Security Group (NSG) that will allow only the needed traffic;
- Azure site Recovery needed for the VM disks replica from on premises to Azure;
- A Recovery Storage Vault, defined in the subscription, needed to activate the ASR functionality;
- Two Azure Storage Accounts where the VMs disks data will be replicate as storage blob. A standard Storage Account will be used for the log replication, a Premium Storage Account will be used for the consolidated VM disk images (the SAP workload on Azure require premium storage);
- One ore more Azure Availability Sets defined within the subscription: these Availability Sets will need to support the Managed Disks and will be required to group the VMs when activated on Azure; they will allow a specific SLA as described at the following link https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability);
- At least one Proximity Placement Group (PPG): SAP workloads are best performing in Azure if Servers are within a PPG;
- An on-premises VM (ASR configuration Server) that will be used to replicate the VMWare on premises VM or physical Machines to Azure. The installation and configuration procedure for the VM is described at the following link:
- VMs configuration https://docs.microsoft.com/en-us/azure/site-recovery/vmware-azure-deploy-configuration-server
- Preparation steps tutorial for enabling the VMs synch: https://docs.microsoft.com/en-us/azure/site-recovery/vmware-azure-tutorial-prepare-on-premises
- An Azure Storage Account where the HANA backups will be stored daily. A Standard account is enough;
- A scheduled job that will copy the on premises data from the HANA backup repository on-prem to the Azure File. The job could be implemented as a power shell script enabled on the on premises Server that has the role of backup. The script will need to use a secure (ex. https) protocol to transfer data;
- A VM on Azure with the SAP HANA role. It will be used to install the SAP HANA software and during normal lifecycle it will be powered off to avoid Azure consumption. The VM must be configured using the SAP best practices:
- It must be an Azure SAP certified VM;
- It must use Premium Storage;
- The Azure disks write accelerator must be used for all the disks used by the HANA log component.
For additional information about the SAP on Azure configuration and architecture, look at the following links:
The proposed solution requires to transfer two set of data from on premises datacenter to Azure Public Cloud:
- VM Replica using Azure Site Recovery (ASR). The replica requires a full synchronization at the startup needed to reach the steady state for the VMs disks on Azure. Once the full replica is completed, a “disks write-only” replica is activated: the data amount transferred from on premises to Azure will depend from the amount of “data-write” on the VMs disks.
- DB HANA Backup copy on the Azure Storage Account. The data amount depends on the size of Data and Log Backup generated daily.
In summary, the proposed solution requires an available network bandwith from on premises to Azure: the bandwith can be allocated on both the ExpressRoute and Internet channels. The usage of the Internet channel could be supported for the following main reasons:
- The endpoints used for the data replication (ASR and Backup copy) are both public so reachable through the Internet;
- All the connections are encrypted via HTTPS using a TLS 1.2 protocol and digital signature of SHA256;
- The Storage Account authentication doesn’t use any username/password combination, instead it used encryption key mechanism.
The usage of express-route requires additional considerations:
- The ExpressRoute circuit should supply the required bandwidth;
- If not using a flat data rate, costs related to the replication could occour;
- In the ExpressRoute circuit, in order to enable the traffic routing to the public Azure PaaS services, like the storage account, it is required to enable the “Microsoft” peering from the Azure Portal. It will allow to announce to the Expressroute circuit all the BGP communities relatative to the Azure PaaS services.
The proposed solution requires to transfer potentially confidential data from on premises to Azure. It is very critical to consider two important security guidelines:
- How data are transferred to Azure (data in transit)?
- How to secure data at rest on Azure?
Lets discuss of every point:
- Data in Transit
- The data transfer from on premises to Azure IaaS (Azure Blob/Azure Files) is executed using encrypted connections:
- Azure Blob: the connection is done over https through TLS 1.2 protocol (https://docs.microsoft.com/it-it/azure/storage/common/storage-security-tls) using an SHA-256 digital signature;
- Azure Files: it uses the same protocols of Azure Blob + SMB 3.x when the connection is established through protocol CIFS instead of HTTPS. This type of connectivity requires that the CIFS client supports SMB 3.x (ex. From Windows 2008/WS 2012 and onward). The following matrix describes the negotiation matrix for the SMB protocol related to the CIFS connectivity: the values in bold support the encrypted connection;
- The tools used to transfer the data from on premises to Azure could be Azcopy; it uses a REST API and connect to the Azure Storage through HTTPS protocol;
- The Azure Storage access is controlled through keys generated on the Storage Account: it doesn’t allow connections using username and password;
- The data transfer from on premises to Azure IaaS (Azure Blob/Azure Files) is executed using encrypted connections:
- Data At Rest
- The data are stored in an instance of Azure Files / Azure Blobs. The data are protected using the Azure Storage Service Encryption (SSE), that is enabled by default on each Azure managed disks and Azure Files. The default configuration uses maged keys handled in a secure way from the datacenter, however optionaly the key could be hanlded by the customer using Azure Key Vault. Further details are available at the following link: https://docs.microsoft.com/en-us/azure/storage/common/storage-service-encryption
In the proposed solution, all the data could be transferred from on premises to Azure using the Azcopy tool that will store them into Azure Files using the HTTPS protocol.
The HANA VM on Azure will read the backup data from Azure file using a CIFS connection through SMB 3.0 protocol that is supported on SLES 12sp3. The data in transit managed by ASR will be done using the https protocol.
The following chapter descrives all the tasks and activities that need to be carried on in order to activate the SAP D/R environment on Azure (aka Recovery Plan). Specifically, it must be used when there is a real need to make the SAP Azure service as the primary one (example during a real disaster or if there is a business need to move the on-premises workload to Azure).
DR Recovery Plan
All the operations needed to enable the SAP service on Azure are described in the following table. It is important to highlight the following:
- The orange color tasks are manual operations;
- The green color tasks are automatic;
- The tasks at step 1.a and 1.b could be executed in parallel;
- The tasks in each step are sequential;
The above steps are represented in the following flow chart.
This ends the descrition of the possible solution # 1.
Below are the link to other parts of this guide:
SAP DR on Azure Part 3 – Possible Solution # 2