This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
doctor:use_cases [2015/02/17 15:20] Gerald Kunzmann [1. Auto Healing] |
doctor:use_cases [2015/07/01 07:56] (current) Ryota Mibu |
||
---|---|---|---|
Line 3: | Line 3: | ||
===== Use Cases ===== | ===== Use Cases ===== | ||
- | Before explaining the use cases for NFVI fault management and maintenance, it is necessary to understand current telecom node, e.g., 3GPP mobile core nodes (MME, S/P-GW etc.) deployments. Due to stringent High Availability (HA) requirements, these nodes often come in an Active-Standby (ACT-SBY) configuration which is a 1+1 redundancy scheme. ACT and SBY nodes (aka Physical Network Function (PNF) in ETSI NFV terminology) are in a hot standby configuration. If ACT node is unable to function properly due to fault or any other reason, the SBY node is instantly made ACT, and service could be provided without any interruption. | + | NOTE: This page is abandoned, see latest document http://artifacts.opnfv.org/doctor/html/02-use_cases.html . |
- | + | ||
- | The ACT-SBY configuration needs to be maintained. This means, when a SBY node is made ACT, either the previously ACT node, after recovery, shall be made SBY, or, a new SBY node needs to be configured. | + | |
- | + | ||
- | The NFVI fault management and maintenance requirements aim at realizing the same HA when the PNFs mentioned above are virtualized i.e. made VNFs, and put under the operation of Management and Orchestration (MANO) framework defined by ETSI NFV [refer to MANO GS]. | + | |
- | + | ||
- | There are three use cases to show typical requirements and solutions for automated fault management and maintenance in NFV. The use cases assume that the VNFs are in an ACT-SBY configuration. | + | |
- | + | ||
- | - Auto Healing (Triggered by critical error) | + | |
- | - Safe Switching (Preventing service stop by handling warnings) | + | |
- | - VM Retirement (Managing service while H/W maintenance) | + | |
- | + | ||
- | ==== 1. Auto Healing ==== | + | |
- | + | ||
- | Auto healing is the process of switching to SBY when the ACT VNF is affected by a fault, and instantiating/configuring a new SBY for the new ACT VNF. Instantiating/configuring a new SBY is similar to instantiating a new VNF and therefor, is outside the scope of this project. | + | |
- | + | ||
- | In Fig. 1, a system-wide view of relevant functional blocks is presented. OpenStack is considered as the VIM implementation which has interfaces with the Resource Pool (NFVI in ETSI NFV terminology) and Users/Clients. VNF implementation is represented as VMs with different colours. User/Clients (VNFM or NFVO in ETSI NFV terminology) own/manage the respective VMs shown with the same colours. | + | |
- | + | ||
- | {{ :requirements_projects:fig_1_fault.png |}} | + | |
- | + | ||
- | The first requirement over here is that OpenStack needs to detect faults (1. Fault Notification in Fig. 1) in the Resource Pool which affect the proper functioning of the VMs on top of it. Relevant fault items should be configurable. OpenStack itself could be extended to detect such faults. A third party fault monitoring element can also be used which then informs OpenStack about such faults. However, the third party fault monitoring element would also be a component of VIM from an architectural point of view. | + | |
- | + | ||
- | Once such fault is detected, OpenStack shall find out which VMs are affected by this fault. In the example in Fig. 1, VM-4 is affected by a fault in Hardware Server-3. Such mapping shall be maintained in OpenStack e.g. shown as the Server-VM info table in OpenStack in Fig. 1. | + | |
- | + | ||
- | Once OpenStack detects which VMs are affected, it then finds out who is the owner/manager of the affected VMs (Step 2 in Fig. 1). In Fig.1, through an Ownership info table, OpenStack knows that for the red VM-4, the manager is the red User/Client. OpenStack then notifies (3. Fault Notification in Fig. 1) red User/Client about this fault, preferably with sufficient abstraction rather than detailed physical fault information. | + | |
- | + | ||
- | The User/Client then switches to its SBY configuration and makes the SBY VNF to ACT state. It further initiates a process to instantiate/configure a new SBY. However, switching to SBY and creating a new SBY is a VNFM/NFVO level operation and therefore, outside the scope of this project. | + | |
- | + | ||
- | Once the User/Client has switched to SBY configuration, it notifies (Step 4 “Instruction” in Figure 1) OpenStack. OpenStack can then take necessary (e.g. pre-determined by the involved network operator) actions on how to clean up the fault affected VMs (Step 5 “Execute Instruction” in Figure 1). | + | |
- | + | ||
- | The key issue in this use case is that a VIM (OpenStack in this context) shall not take a standalone fault recovery action (e.g. migration of the affected VMs) before the ACT-SBY switching is complete, as that might violate the ACT-SBY configuration and render the VNF out of service. | + | |
- | + | ||
- | ==== 2. Safe Switching ==== | + | |
- | + | ||
- | VIM USER needs to know virtual resource WILL HAVE failure caused by NFVI fault to keep his/her service running. | + | |
- | E.g.) VIM user is liable for a VNF which runs on two VMs in active-standby mode and has operator command to switch active-standby nodes. The user can reduce the number of job failure, by kicking this command before failure occur, rather than auto switching triggered by periodic health check between nodes. The physical machine, which hosts the VM running as active node, has abnormal temperature and scheduled to be halt in seconds. The user needs to know unavailability of the VM immediately, so that the user can switch active-standby nodes safely. | + | |
- | + | ||
- | VIM needs to detect NFVI failure and notify unavailability of affected virtual resources to the owner as the same as ‘Auto Healing’ case except predict failure before actual failure occurs by rich event inspector. | + | |
- | + | ||
- | ==== 3. VM Retirement ==== | + | |
- | + | ||
- | VIM USER needs to know virtual resource WILL be unavailable because of NFVI maintenance, to keep his/her service available. | + | |
- | E.g.) VIM user is liable for a VNF that runs on VM with dedicated resource and does not allow pause of vCPU for real-time processing, which means VIM provider should not perform live-migration for the VM. VIM user, however, has way to replace VNF without stopping Network Service using it, like forwarding graph switchover. Then, the VIM provider marked the physical machine which hosts the VM to be in maintenance mode for replacement. The VIM user needs to know future unavailability of the VM, so that the user can recreate VM carefully not to stop Network Service. | + | |
- | + | ||
- | VIM needs to receive NFVI maintenance instruction and notify unavailability of affected virtual resources to the owner. | + | |
- | E.g.: There is a VIM user has created VM through VIM (Red User Fig.2). The VM is hosted on physical machine in NFVI (Red VM Fig.2). Then, administrator (Administrator in Fig.2) schedules replacement and notifies it. The VIM provider needs to receive that maintenance instruction from administrator (Step 1 in Fig.1) and mark the physical machine, afterward the VIM provider perform the same steps in the ‘Auto Healing’ case except using different semantic in notifications. | + | |
- | + | ||
- | {{ :requirements_projects:fig_2_fault.png |}} | + |