data collection for failure prediction
A failure prediction system could be deployed to help the NFV system avoid the unexpected failure in advance.The failure prediction topic has been studied by ETSI NFV ISG, and some general requirements are developed there. These requirements should be the initial input for this topic in OPNFV. For example, the following requirements are listed in NFV GS Draft NFV-REL 001 (v1.0.0, 2014-11):
The failure prediction topic will also be one of the important issues in ETSI NFV Phase 2. So it is possible that such requirements for failure prediction will be updated or elaborated during the lifecycle of this project, and the updated part will be synchronized and merged into this project.
Collecting data is the first and most significant step of a failure prediction system, which requires different kinds of data (e.g., log files, real time parameters of hardware and software, environment parameters, events, etc.) from various sources (e.g, NFVI, VIM, etc.). A failure predictor can notify us about failure in advance by analyzing the collected data. Some upstream projects for data collection has existed (e.g. the Monasca project and Ceilometer project in OpenStack for system resource monitoring). However, they do not cover all specific requirements in the OPNFV environment. Therefore our first task is to investigate the gaps between those upstream projects, other OpenStack components and the OPNFV requirements. Meanwhile, we will identify which kind of data is required to collect. After that, we plan to deliver some documents on the VIM northbound API, implementation architecture and plan. Finally, we will implement the failure prediction framework in detail.
Ceilometer and Monasca can get some metrics about physical resource and virtual resource. But they do not cover some metrics about application and guest OS. We try to give some metrics as examples, but the following list is non-exhaustive:
The data collector consists of Ceilometer and Monasca which can be extended to plugin some other open source data collectors, e.g. Zabbix, Nagios, Cacti. Based on real-time analytics techniques and machine learning techniques, the failure predictor analyses the data gathered by the data collector to automatically determine whether a failure will happen. If a failure is judged, then the failure predictor sends failure notifications to the failure management module (e.g. the Doctor module), which could handle these notifications.
In OPNFV release 2, we limit the scope of this project to the data collector.
Describe the problem being solved by project: As a requirements category project, it plans to solve the problem as following:
Specify any interface/API specification proposed: Additional interface specifications:
Identity a list of features and functionality will be developed:
Identify what is in or out of scope. So during the development phase, it helps reduce discussion: In scope:
Out of scope
Describe how the project is extensible in future: The achievements of this project will be used as the input for next stage, e.g. Integration & Testing, and Collaborative Development.
Specify testing and integration like interoperability, scalability, high availablity
Identify similar projects is underway or being proposed in OPNFV or upstream project
Identify any open source upstream projects and release timeline.
Identify any specific development be staged with respect to the upstream project and releases.
Are there any external fora or standard development organization dependencies. If possible, list and informative and normative reference specifications.
Project Creation Date:
Lifecycle State: Incubation
Primary Contact: email@example.com
Project Lead: firstname.lastname@example.org
Jira Project Name: Data Collection for Failure Prediction
Jira Project Prefix: PREDICTION
Mailing list tag: [prediction]
Names and affiliations of the committers:
Any other contributors: TBD
Described the project release package as OPNFV or open source upstream projects.
If project deliverables have multiple dependencies across other project categories, described linkage of the deliverables.
When is the first release planned?
Will this align with the current release cadence