Table of Contents

Failure Prediction:

Project description:

A failure prediction system could be deployed to help the NFV system avoid the unexpected failure in advance.The failure prediction topic has been studied by ETSI NFV ISG, and some general requirements are developed there. These requirements should be the initial input for this topic in OPNFV. For example, the following requirements are listed in NFV GS Draft NFV-REL 001 (v1.0.0, 2014-11):

The failure prediction topic will also be one of the important issues in ETSI NFV Phase 2. So it is possible that such requirements for failure prediction will be updated or elaborated during the lifecycle of this project, and the updated part will be synchronized and merged into this project.

Collecting data is the first and most significant step of a failure prediction system, which requires different kinds of data (e.g., log files, real time parameters of hardware and software, environment parameters, events, etc.) from various sources (e.g, NFVI, VIM, etc.). A failure predictor can notify us about failure in advance by analyzing the collected data. Some upstream projects for data collection has existed (e.g. the Monasca project and Ceilometer project in OpenStack for system resource monitoring). However, they do not cover all specific requirements in the OPNFV environment. Therefore our first task is to investigate the gaps between those upstream projects, other OpenStack components and the OPNFV requirements. Meanwhile, we will identify which kind of data is required to collect. After that, we plan to deliver some documents on the VIM northbound API, implementation architecture and plan. Finally, we will implement the failure prediction framework in detail.

Data Category:

Ceilometer and Monasca can get some metrics about physical resource and virtual resource. But they do not cover some metrics about application and guest OS. We try to give some metrics as examples, but the following list is non-exhaustive:

Architecture:

The whole failure prediction system is made up of a data collector, a failure predictor and a failure management module, which is shown in the following figure.

The data collector consists of Ceilometer and Monasca which can be extended to plugin some other open source data collectors, e.g. Zabbix, Nagios, Cacti. Based on real-time analytics techniques and machine learning techniques, the failure predictor analyses the data gathered by the data collector to automatically determine whether a failure will happen. If a failure is judged, then the failure predictor sends failure notifications to the failure management module (e.g. the Doctor module), which could handle these notifications.

In OPNFV release 2, we limit the scope of this project to the data collector.

Scope:

Describe the problem being solved by project: As a requirements category project, it plans to solve the problem as following:

Specify any interface/API specification proposed: Additional interface specifications:

Identity a list of features and functionality will be developed:

Identify what is in or out of scope. So during the development phase, it helps reduce discussion: In scope:

Out of scope

Describe how the project is extensible in future: The achievements of this project will be used as the input for next stage, e.g. Integration & Testing, and Collaborative Development.

Testability: ''(optional, Project Categories: Integration & Testing)''

Specify testing and integration like interoperability, scalability, high availablity

Documentation: ''(optional, Project Categories: Documention)''

Dependencies:

Identify similar projects is underway or being proposed in OPNFV or upstream project

Identify any open source upstream projects and release timeline.

Identify any specific development be staged with respect to the upstream project and releases.

Are there any external fora or standard development organization dependencies. If possible, list and informative and normative reference specifications.

Key Project Facts

Project Creation Date:
Project Category:
Lifecycle State: Incubation
Primary Contact: linghui.zeng@huawei.com
Project Lead: linghui.zeng@huawei.com
Jira Project Name: Data Collection for Failure Prediction
Jira Project Prefix: PREDICTION
Mailing list tag: [prediction]

Committers and Contributors:

Names and affiliations of the committers:

Any other contributors: TBD

Planned deliverables

Described the project release package as OPNFV or open source upstream projects.

If project deliverables have multiple dependencies across other project categories, described linkage of the deliverables.

Proposed Release Schedule:

When is the first release planned?

Will this align with the current release cadence