This is an old revision of the document!
Before explaining the use cases for NFVI fault management and maintenance, it is necessary to understand current telecom node, e.g., 3GPP mobile core nodes (MME, S/P-GW etc.) deployments. Due to stringent High Availability (HA) requirements, these nodes often come in an Active-Standby (ACT-SBY) configuration which is a 1+1 redundancy scheme. ACT and SBY nodes (aka Physical Network Function (PNF) in ETSI NFV terminology) are in a hot standby configuration. If ACT node is unable to function properly due to fault or any other reason, the SBY node is instantly made ACT, and service could be provided without any interruption.
The ACT-SBY configuration needs to be maintained. This means, when a SBY node is made ACT, either the previously ACT node, after recovery, shall be made SBY, or, a new SBY node needs to be configured.
The NFVI fault management and maintenance requirements aim at realizing the same HA when the PNFs mentioned above are virtualized i.e. made VNFs, and put under the operation of Management and Orchestration (MANO) framework defined by ETSI NFV [refer to MANO GS].
There are three use cases to show typical requirements and solutions for automated fault management and maintenance in NFV. The use cases assume that the VNFs are in an ACT-SBY configuration.
VIM USER needs to know virtual resource HAS failure caused by NFVI fault to keep his/her services capability. E.g.: VIM user is liable for a VNF which runs on two VMs both serving behind a load balancer. When a physical machine which hosts one of those VMs is down, the user needs to know unavailability of VM immediately, so that the user can recover VNF capability by creating new VM.
VIM needs to detect NFVI failure and notify unavailability of affected virtual resources to the owner. E.g.: There are three VIM users each belongs different projects and has created VM(s) through VIM (Fig.1). When a physical machine which hosts VM owned by one of VIM users (Red VM and User in Fig.1) is down, VIM needs to detect by receiving fault notification of hardware from Resource Pool (Step 1 in Fig.1) and identify fault-affected VM and its owner who want to be informed (Step 2 in Fig.1), then notify failure to the owner (Step 3 in Fig.1). The failure notification should NOT include information of physical resource, and inform the other users (Blue and Green User in Fig.1).
VIM USER needs to know virtual resource WILL HAVE failure caused by NFVI fault to keep his/her service running. E.g.) VIM user is liable for a VNF which runs on two VMs in active-standby mode and has operator command to switch active-standby nodes. The user can reduce the number of job failure, by kicking this command before failure occur, rather than auto switching triggered by periodic health check between nodes. The physical machine, which hosts the VM running as active node, has abnormal temperature and scheduled to be halt in seconds. The user needs to know unavailability of the VM immediately, so that the user can switch active-standby nodes safely.
VIM needs to detect NFVI failure and notify unavailability of affected virtual resources to the owner as the same as ‘Auto Healing’ case except predict failure before actual failure occurs by rich event inspector.
VIM USER needs to know virtual resource WILL be unavailable because of NFVI maintenance, to keep his/her service available. E.g.) VIM user is liable for a VNF that runs on VM with dedicated resource and does not allow pause of vCPU for real-time processing, which means VIM provider should not perform live-migration for the VM. VIM user, however, has way to replace VNF without stopping Network Service using it, like forwarding graph switchover. Then, the VIM provider marked the physical machine which hosts the VM to be in maintenance mode for replacement. The VIM user needs to know future unavailability of the VM, so that the user can recreate VM carefully not to stop Network Service.
VIM needs to receive NFVI maintenance instruction and notify unavailability of affected virtual resources to the owner. E.g.: There is a VIM user has created VM through VIM (Red User Fig.2). The VM is hosted on physical machine in NFVI (Red VM Fig.2). Then, administrator (Administrator in Fig.2) schedules replacement and notifies it. The VIM provider needs to receive that maintenance instruction from administrator (Step 1 in Fig.1) and mark the physical machine, afterward the VIM provider perform the same steps in the ‘Auto Healing’ case except using different semantic in notifications.