How to get started with the Disaster Recovery solution for Performance Cloud VMware (NSX-T)

 

TABLE OF CONTENTS

Description

Here is a guide to help you deploying your disaster recovery (or “DR”) solution for Performance Cloud VMware, based on VMware Cloud Director Availability. With this guide, you will learn many things like how to setup the replication of virtual machines, how to proceed with a test failover and how to proceed with a failover to recover from a disaster.

Definitions


Recovery Point Objective (RPO)

The RPO is the longest tolerable timeframe of data loss. 

Example: 

With one (1) hour RPO, the recovered virtual machine can have no more than one (1) hour of data lost. With shorter RPO intervals, we ensure less data loss during recovery, but this is consuming more network bandwidth to keep the replicated virtual machines up to date.

However, this does not mean that virtual machines are replicated every one (1) hour. Also, a RPO violation can occur if the replication timeframe is too short according to the sizing of virtual machines, the change rate inside virtual machines and the available network bandwidth.
 

For more information about the replication scheduler and how the replication policy works, please see the following articles: 

- https://docs.vmware.com/en/vSphere-Replication/8.6/com.vmware.vsphere.replication-admin.doc/GUID-84FAF645-1C65-413D-A89B-70DBA0990631.html

- https://docs.vmware.com/en/vSphere-Replication/8.6/com.vmware.vsphere.replication-admin.doc/GUID-07B5263A-8E10-42E7-B68B-325BBA910489.html

Requirements

  • For now, there is no offer or SKU to enable in your account management portal Cumulus to get this feature. Please contact your account manager for more information.

  • This guide assumes that you already have your production servers running in the primary site.

  • Required virtual networks, IP Sets, custom applications, and NAT & firewall rules must be preconfigured in your secondary organization to allow a faster recovery and to properly test recovery plans. Please follow the Getting Started guide for Performance Cloud VMware for guidance.

  • To configure email notifications, you must provide your own SMTP settings.

Important Notes

  • For now, this disaster recovery (or “DR”) solution is only available in Canada for organisations in Performance Cloud VMware (NSX-T).

  • In your secondary site, a new WAN IP address will be used. Please plan accordingly your disaster recovery plan to also include DNS and/or VPN changes if required.

  • It is strongly recommended to test your recovery plan at regular intervals.
     
  • You will receive a separate set of credentials to access the secondary organization.

Procedures


Review the peer sites

 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.

  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Peer Sites.


 You should see two (2) peer sites. In this example, VCAV-CAE2 will be the destination site for the replication task.


Graphical user interface, text, application, website

Description automatically generated

 

 

How to configure the replication of virtual machines

 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.
     
  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Incoming Replications. Then, click on the “New protection” icon.

    Graphical user interface, application

Description automatically generated


  4. In the new window to start configuring the incoming replication, enter credentials of a user having the “Organization Administrator” role in the primary organization (in the production site) by using this format: username@organization

    In this example: username@org-name-prod (“username” is a user with the “Organization Administrator” role for the organization named “org-name-prod”)

    Note: Credentials for the other organization are not saved. Credentials will not be asked again until your session is active on the portal.

    Graphical user interface, application

Description automatically generated


  5. Select the source virtual machines to replicate and click on NEXT.

    Note: If you were not able to authenticate at the previous step, the NEXT button will remain greyed out.

    In this example, the vApp named “vApp-SW” is chosen.

    Graphical user interface, application

Description automatically generated


  6. Select the destination virtual data center or "VDC" and the storage policy for the replication. The storage policy is applied to the whole virtual machine. A specific disk cannot be replicated to a different storage tier. Then, click on NEXT.

    In most cases, you will have no to few customization options.




  7. In the Settings section, the possibility to exclude some disks from the replication task can be enabled.
    Then, click on NEXT.






  8. If the option to exclude disks was previously enabled, select the virtual machine(s) and deselect the disk(s) that you do not want to replicate. Then, click on NEXT.

    Graphical user interface

Description automatically generated


  9. Review selected replication settings and click on FINISH.

    Graphical user interface, application

Description automatically generated


    After this step, the replication will occur, and disks will be created in the destination site.

    Graphical user interface, website

Description automatically generated

    Graphical user interface, text, application, email

Description automatically generated



Monitor the replication status

 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.
     
  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. From the dashboard, you can review the replication status, the overall replication health and some charts.

    Graphical user interface, website

Description automatically generated


  4. You can also validate if RPO violations often occur.

    Graphical user interface, text, application, website

Description automatically generated

 


Configure email notifications


If the setup of email notifications is already in place in the Performance Cloud VMware portal, steps 2 to 4 can be skipped.
 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.

  2. Click on Administration, then, under the Settings section, click on Email. In the Email sub-section, click on EDIT.

     
    Graphical user interface, text, application, website

Description automatically generated

      
  3. Fill all required fields with your SMTP settings and click on Notification Settings.

    Graphical user interface, application

Description automatically generated


  4. Enter the desired sender’s email address and click on SAVE.

    Graphical user interface, text, application

Description automatically generated

     
  5. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  6. Click on Events and Notifications. From the Events and Notifications section, you can quickly enable email notifications for all event types.

    Graphical user interface, text, application, website

Description automatically generated

    Graphical user interface, application

Description automatically generated


  7. In option, you can then configure replication notifications based on RPO violation thresholds by clicking on EDIT.

    Graphical user interface, text, application, email

Description automatically generated


    Once thresholds are updated, click on APPLY.

    Graphical user interface, application

Description automatically generated

     
  8. Test your SMTP settings and update them if needed.

    Graphical user interface, text, application, email

Description automatically generated

    Graphical user interface, text, application

Description automatically generated

    Graphical user interface, text, application, email

Description automatically generated

     

How to configure the recovery settings

 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.

  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Incoming Replications. Then, select a vApp or a virtual machine (change the Grouping view if needed). Once the selection is made, click on ALL ACTIONS, then click on Recovery Settings.

    Graphical user interface, application

Description automatically generated

     
  4. On the next screen, the target network can be enforced for the failover or the test failover.

    Graphical user interface, text, application

Description automatically generated


  5. Settings for NICs or the guest customization can also be customized for failovers or test failovers if needed.

    Graphical user interface, text, application

Description automatically generated

     
  6. Once you are done with recovery settings, click on APPLY.

    A screenshot of a computer

Description automatically generated

 

 

 

Manage virtual machine’s instances

 

In some cases (such as security investigations, ransomware, or legal hold for example), you may want to keep a particular virtual machine’s instance without having to stop the replication to ensure its availability. In that case, you can store a specific virtual machine’s instance.

 


Store a virtual machine’s instance

 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.
     
  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Incoming Replications. Then, click on the replication task.

    Graphical user interface, text, application, email

Description automatically generated


  4. Identify and select the virtual machine’s instance time you want to store. Then, click on STORE.

    Graphical user interface, text, application

Description automatically generated


  5. Accept by clicking on STORE again.




  6. The retention period for the selected virtual machine’s instance will then change to Permanent.

    Graphical user interface, text

Description automatically generated 

 



Stop storing a virtual machine’s instance
 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.

  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Incoming Replications. Then, click on the replication task.

    Graphical user interface, text, application, email

Description automatically generated


  4. Identify and select the virtual machine’s instance time you don’t want to store anymore. Then, click on DON’T STORE.

    Graphical user interface, text

Description automatically generated


  5. Accept by clicking on “DON’T STORE” again.

    Graphical user interface, text, application

Description automatically generated

 

 

 

Configure a recovery plan


Here are steps to create a simple recovery plan.
 

In an advanced network or to recover using different ways according to the disaster, multiple recovery plans can be created. Recovery plans can also be cloned to speed up the creation of multiple recovery plans.
 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.
     
  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated

     
  3. On the next screen, click on Recovery Plans. Then, click on the “New recovery plan” icon.

    Graphical user interface, text, application, email, website

Description automatically generated


  4. Name your plan and add a description. Then, click on OK.

    Graphical user interface, text, application

Description automatically generated


  5. Start configuring the plan by adding a new step to the plan.

    Graphical user interface, text, application

Description automatically generated


  6. Name your step. In option, add some wait time before the next step or add a message to display for a manual validation before the next step. Then, click on NEXT.

    Example below to start the domain controller and let 60 seconds before starting the second step.

    Graphical user interface, text, application, email

Description automatically generated


  7. Select the virtual machine(s) to recover in this first step and click on NEXT.

    Graphical user interface, application

Description automatically generated


  8. Review chosen settings and click on FINISH.

    Graphical user interface

Description automatically generated


Repeat steps 5 to 8 to add more steps to the recovery plan.


Example below to start the database server as a second step and ask for a manual validation.


Graphical user interface, text, application

Description automatically generated


Graphical user interface, text, application

Description automatically generated


Graphical user interface, text, application, email

Description automatically generated


Graphical user interface, application, Teams

Description automatically generated



Example below to start remaining virtual machines as a third step without validation or wait time.


Graphical user interface, text, application

Description automatically generated


Graphical user interface, text, application, email

Description automatically generated


Graphical user interface, application, Teams

Description automatically generated


Graphical user interface, application

Description automatically generated




Test a recovery plan

 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.

  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Recovery Plans. Then, click on the recovery plan to test and click on the Test button.

    Graphical user interface, text, application, email, website

Description automatically generated


  4. Click on OK

    Graphical user interface, text, application

Description automatically generated


  5. vApps and virtual machines will now be created in the recovery site. Virtual machines will start following steps in the recovery plan (including configured wait time or prompts). Required networks will be added to vApps and recovery settings will be applied.




    If a prompt was configured, you must acknowledge the message to allow the plan to continue its execution with next steps.

    Graphical user interface, text, application

Description automatically generated

    Graphical user interface, text, application

Description automatically generated


  6. Once the recovery plan execution is completed, validate the network functionality and test server roles, accessible services and applications for remote users (including but not limited to Remote Desktop Services and Web Services).

  7. Adjust the recovery plan if needed. If applicable, add any missing configuration that would allow a faster recovery.

  8. Once tests are completed, do the test cleanup.

    A screenshot of a computer

Description automatically generated

    Graphical user interface, text, application

Description automatically generated


  9. vApps and virtual machines will now be deleted from the recovery site.



     

Proceed with a failover


This is the option to take to recover from a disaster for the primary site.

A failover and a test failover are similar. The main differences are that after a failover, you don’t have the cleanup option and the replication is now stopped.

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.
     
  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Recovery Plans. Then, click on the recovery plan to start and click on the Failover button.

    Graphical user interface, text, application, email

Description automatically generated

     
  4. Click on OK

     
    Graphical user interface, text, application

Description automatically generated

     
  5. vApps and virtual machines will now be created in the recovery site. Virtual machines will start following steps in the recovery plan (including configured wait time or prompts). Required networks will be added to vApps and recovery settings will be applied.

    A screenshot of a computer

Description automatically generated

    If a prompt was configured, you must acknowledge the message to allow the plan to continue its execution.

    Graphical user interface, text, application

Description automatically generated

    Graphical user interface, text, application

Description automatically generated


  6. Once the recovery plan execution is completed, the production servers now run in the secondary site.

  7. Validate the network functionality and test server roles, accessible services and applications for remote users (including but not limited to Remote Desktop Services and Web Services).

  8. If applicable, add any missing configuration to make all services available.

  9. Once the primary site is back online, you can also consider deleting virtual machines in the primary location as their data is deprecated.


     

Failback to the production site


Once the primary site is back online, a failback to the primary site can be considered.
 

  1. Login to the Performance Cloud VMware portal with a user having the “Organization Administrator” role on the secondary organization.

  2. Click on More, then click on Availability (VCAV-XXXX). XXXX will change according to the site in use.

    Graphical user interface, text, application, website

Description automatically generated


  3. On the next screen, click on Incoming Replications and select the replication task. Then, click on ALL ACTIONS then click on Reverse.

    Graphical user interface, application

Description automatically generated

     
  4. In the new window to reverse the replication, enter credentials of a user having the “Organization Administrator” role in the primary organization (in the primary/production site) by using this format: username@organization

    In this example: username@org-name-prod (“username” is a user with the “Organization Administrator” role for the organization named “org-name-prod”)

    Note: Credentials for the other organization are not saved. Credentials will not be asked again until your session is active on the portal.

    Graphical user interface, text, application

Description automatically generated

     
  5. Click on REVERSE.

    Graphical user interface, text, application, email

Description automatically generated


  6. The source and the destination for the replication task will now be reversed.

    If the reverse option is failing, validate that you have enough free disk space in the destination. It could also be impossible to replicate all virtual machines on the same storage policy if multiple disk tiers are in use. If applicable, skipped virtual machines can be added later using a different storage policy.

    You can also consider removing all replication jobs and all recovery plans. Then, create them again to replicate virtual machines in the opposite way (in the primary site).

  7. Proceed with a failover (refer to the previous section)

References

 - https://docs.vmware.com/en/VMware-Cloud-Director-Availability/4.5/VMware-Cloud-Director-Availability-User-Guide/GUID-1827B289-289F-45C3-B42A-E2C788C888F2.html
 

- https://docs.vmware.com/en/vSphere-Replication/8.6/com.vmware.vsphere.replication-admin.doc/GUID-84FAF645-1C65-413D-A89B-70DBA0990631.html
 

- https://docs.vmware.com/en/vSphere-Replication/8.6/com.vmware.vsphere.replication-admin.doc/GUID-07B5263A-8E10-42E7-B68B-325BBA910489.html