Differences

This shows you the differences between two versions of the page.

--- collaborative_development_projects:rescuer [2014/11/25 00:24]
Zhipeng Huang
+++ collaborative_development_projects:rescuer [2015/04/10 07:19]
Zhipeng (Howard) Huang
@@ Line 2: / Line 2: @@
   * ''Proposed name for the project'': Rescuer
-  * ''Proposed name for the repository'': opnfv-dr
+  * ''Proposed name for the repository'': rescuer
-  * ''Project Categories'': (Requirement,Collaborative Development, Documentation)
+  * ''Project Categories'': Requirement
 ==== Project description: ====
-Implementations to handle disaster recovery (DR) need close interaction with OpenStack. They are currently out of scope of OPNFV, but important for any carrier network and other mission critical applications. Those DR implementations need specific support by the cloud OS (OpenStack) to do their task.
+Disaster Recovery (DR) is a very important issue in NFV, for example when dealing with burst hours during holidays , shut down or malfunction of a VIM instance or even a complete site may cause severe service interruption, or a complete service termination. without a strong infrastructure level disaster recovery support. Therefore this project is proposed to develop use cases, requirements, as well as upstream project blue prints, with focus on how to make infrastructure DR-capable to keep the service continuity meet the requirement defined by the terms like RPO or RTO when extreme scenario strikes.
-In current OpenStack disaster recovery (DR) settings, there is no way for DR-Implementations, such as various DR Middlewares,  to see what DR state a VM is in. This would be especially problematic when operators want to have DR sites across several DCs that are not co-located. For example VM would be wrongfully terminated or activated in the DR process, which could lead to severe service disturbance. This project would provide solution to this problem by enabling OpenStack Nova several DR features. By adding these new features, Operators would be able to monitor the DR state of given DCs, and perform DR operation accordingly.
+=== ETSI NFV Requirements ===
+From the perspective of resiliency (with respect to disaster recovery and flexibility in resource usability), it is desirable to be able to locate the standby node in a topologically different site and maintain connectivity. In order to support disaster recovery for a certain critical functionality, the NFVI resources needed by the VNF should be located in different geographic locations; therefore, the implementation of NFV should allow a geographically redundant deployment.
+During a disaster, multiple VNFs in a NFVI-PoP may fail; in more severe cases, the entire NFVI-PoP or multiple NFVI-PoPs may fail. Accordingly, the recovery process for such extreme scenarios needs to take into account additional factors that are not present for the single VNF failure scenario. The restoration and continuity would be done at a WAN scope (compared to a HA recovery done at a LAN scope, described above). They could also transcend administrative and regulatory boundaries, and involve restoring service over a possibly different NFV-MANO environment.  Depending on the severity of the situation, it is possible that virtually all telecommunications traffic/sessions terminating at the impacted NFVI-PoP may be cutoff prematurely. Further, new traffic sessions intended for end users associated with the impacted NFVI-PoP may be blocked by the network operator depending on policy restrictions. As a result, there could be negative impacts on service availability and reliability which need to be mitigated. At the same time, all traffic/sessions that traverse the impacted NFVI-PoP intended for termination at other NFVI-PoPs need to be successfully re-routed around the disaster area. Accordingly:
+  * 	Network Operators should provide the Disaster Recovery requirements and NFV-MANO should design and develop the Disaster Recovery policies such that:
+        a. Include the designation of regional disaster recovery sites that have sufficient VNF resources and comply with any special regulations including geographical location.
+        b. Define prioritized lists of VNFs that are considered vital and need to be replaced as swiftly as possible. The prioritized lists should track the Service Availability levels from sub-clause 6.3. These critical VNFs need to be instantiated and placed in proper standby mode in the designated disaster recovery sites.
+        c. Install processes to activate and prepare the appropriate disaster recovery site to “takeover” the impacted NFVI-PoP VNFs including bringing the set of critical VNFs on-line,  instantiation/activation of additional standby redundant VNFs as needed, restoration of data and reconfiguration of associated service chains at the designated disaster recovery site as soon as conditions permit.
+  * 	Network Operators should be able to modify the default Disaster Recovery policies defined by NFV-MANO, if needed
+  * 	The designated disaster recovery sites need to have the latest state information on each of the NFVI-PoP locations in the regional area conveyed to them on a regular schedule. This enables the disaster recovery site to be prepared to the extent possible, when a disaster hits one of the sites. Appropriate information related to all VNFs at the failed NFVI-PoP is expected to be conveyed to the designated disaster recovery site at specified frequency intervals.
+  * 	After the disaster situation recedes, Network Operators should restore the impacted NFVI-PoP back to its original state as swiftly as possible, or deploy a new NFVI-PoP to replace the impacted NFVI-PoP based on the comprehensive assessment of the situation. All on-site Service Chains must be reconfigured by instantiating fresh VNFs at the original location. All redundant VNFs activated at the designated Disaster Recovery site to support the disaster condition must be de-linked from the on-site Service Chains by draining and re-directing traffic as needed to maintain service continuity. The redundant VNFs are then placed on standby mode per disaster recovery policy.
+=== DR In OpenStack ===
+Disaster Recovery (DR) for OpenStack is an umbrella topic that describes what needs to be done for applications and services (generally referred to as workload) running in an OpenStack cloud to survive a large scale disaster. Providing DR for a workload is a complex task involving infrastructure, software and an understanding of the workload. To enable recovery following a disaster, the administrator needs to execute a complex set of provisioning operations that will mimic the day-to-day setup in a different environment. Enabling DR for OpenStack hosted workloads requires enablement (APIs) in OpenStack components (e.g., Cinder) and tools which may be outside of OpenStack (e.g., scripts) to invoke, orchestrate and leverage the component specific APIs.
+{{:collaborative_development_projects:dr.png?300|}}
+Disaster Recovery should include support for:
+Capturing the metadata of the cloud management stack, relevant for the protected workloads/resources: either as point-in-time snapshots of the metadata, or as continuous replication of the metadata.
+Making available the VM images needed to run the hosted workload on the target cloud.
+Replication of the workload data using storage replication, application level replication, or backup/restore.
+We note that metadata changes are less frequent than application data changes, and different mechanisms can handle replication of different portions of the metadata and data (volumes, images, etc)
+The approach is built around:
+Identify required enablement and missing features in OpenStack projects
+Create enablement in specific OpenStack projects
+Create orchestration scripts to demonstrate DR
+When resources to be protected are logically associated with a workload (or a set of inter-related workloads), both the replication and the recovery processes should be able to incorporate hooks to ensure consistency of the replicated data & metadata, as well as to enable customization (automated or manual) of the individual workload components at recovery site. Heat can be used to represent such workloads, as well as to automate the above processes (when applicable).
 ==== Scope: ====
   * ''Describe the problem being solved by project''
-The project proposes to introduce new DR features to OpenStack that will be used by DR-MW to:
+The project aims to develop the requirements and use cases for NFVI and VIM on supporting Telco grade DR implementation :
-  * ''Control DR state of VMs''
-  * ''Monitor progress''
+  * Requirements for VIM and NFVI to support Multisite DR, including:
-  * ''Deploy DR policies''
-  * ''..''
+    a. For active-active, active-hot standby, active-cold standby design
+    b. Replication of all configuration and metadata required by an application - Neutron, Cinder, Nova, etc.
+    c. Ability to ensure consistency of the replicated data & metadata
+    d. Supporting a wide range of data replication methods: Storage systems based replication, Hypervisor assisted (possibly between heterogeneous storage systems). For example, using DRBD or Qemu based replication, Backup and Restore methods, Pluggable application level replication methods
-For OPNFV Release One, currently we specifically identify two sets of features we want to achieve:
+  * DR Use Cases to provide more requirements.
-  - Add VM DR State in Nova API and DB. Examples of the VM DR states are given as follows:
+  * Formulate BPs that would reflect the requirements, and implement those BPs in the upstream community.
-    * ''DR Inactive：Disaster Recovery is either not configured or not supported''
+  * ..
-    * ''DR Complete:  Disaster Recovery has completed. All Standby VM related data transfer is terminated on Standby Site. Further actions would be based on DR policies.''
-    * ''..''
-  - Initial design of VM DR state machine taskflow
-    * ''Design the state transition workflow that would later be used for the purpose of automation.''
-Hence we propose this project to enable OpenStack to be DR-state-transparent for Teclo operators.
-.
   * ''Specify any interface/API specification proposed''
-New OpenStack Nova DR API specification would be proposed.
+New DR related API specification would be proposed.
   * ''Identity a list of features and functionality will be developed.''
-State model for DR states
+  - Requirements for infrastructure to support Telco grade DR implementation
+  - Use Cases for infrastructure level support Telco grade DR implementation
-New DR APIs to reflect and facilitates the DR state
+  - Upstream feature development according to the developed requirements.
   * ''Identify what is in or out of scope. So during the development phase, it helps reduce discussion.''
-In scope: State model for DR states, new DR APIs in OpenStack Nova
+In scope: infrastructure level support for Telco grade DR implementation
-Out of scope: Specific DR functions that should be implemented via various DR middlewares.
+Out of scope: DR Policy making, DR Decision and Planning making at the upper level.
   * ''Describe how the project is extensible in future''
-By enriching Nova’s API and DB features, this project is extendable for future functions.
+This project is extendable for future functions.
 ==== Testability: ''(optional, Project Categories: Integration & Testing)'' ====
-TBD.
+N/A.
 ==== Documentation: ''(optional, Project Categories: Documention)'' ====
@@ Line 71: / Line 96: @@
   * ''Identify similar projects is underway or being proposed in OPNFV or upstream project''
-None.
+OPNFV: Multisite, HA For VNF, Doctor.
   * ''Identify any open source upstream projects and release timeline.''
-OpenStack Nova would be the upstream project
+OpenStack
-It would be aligned with OpenStack release schedule (per cycle) and OPNFV schedule. Hence it would meet both March release mark of OpenStack and OPNFV
+It would be aligned with OpenStack release schedule (per cycle) and OPNFV schedule.
@@ Line 86: / Line 111: @@
   * ''Are there any external fora or standard development organization dependencies. If possible, list and informative and normative reference specifications.''
-TBD
+ETSI NFV REL
   * ''If project is an integration and test, identify hardware dependency.''
-TBD
+None
 ==== Committers and Contributors: ====
-  * ''Name of and affiliation of the maintainer'':
-Bo Zhang, Huawei;
-Shen Wang, Huawei;
-Zhipeng(Howard) Huang, Huawei;
   * ''Names and affiliations of the committers'':
-Bo Zhang, Huawei;
-Shen Wang, Huawei;
+Guoguang Li, (liguoguang@huawei.com);
-Zhipeng(Howard) Huang, Huawei;
+Zhipeng(Howard) Huang, (huangzhipeng@huawei.com);
   * ''Any other contributors'':
 TBD
@@ Line 110: / Line 132: @@
   * ''Project release package as OPNFV or open source upstream projects''
-This project would be released as an installable package from the open source upstream project, and it could be easily integrated to OPNFV platform
+As upstream projects
   * ''Project deliverables with multiple dependencies across other project categories''
-N/A
+None
 ==== Proposed Release Schedule: ====
-This project is planned for the first release of OPNFV platform.
+This project is planned for the third release of OPNFV platform.

Wiki

User Tools

Site Tools

Differences

Page Tools