Doctor Team Meetings
March 1, 2016
Feb 23, 2016
Feb 16, 2016
Feb 9, 2016
Feb 2, 2016
Jan 26, 2016
Jan 19, 2016
Jan 12, 2016
Jan 5, 2016
Dec 22, 2015
Dec 15, 2015
Dec 8, 2015
Dec 1, 2015
Nov 24, 2015
Nov 17, 2015
Nov 9-10, 2015
OPNFV design summit sessions
Nov 9th, 15:15-14:00 at Room #3:
Nov 10th, 11:00-12:00 at Room #2:
Nov 10th, 14:00-15:00 at Room #2:
Nov 3, 2015
Oct 28, 2015
Meeting Info:
Oct 28 (Wednesday) F2F meeting in OpenStack Summit Tokyo
When: 9-11am
Where: Community Lounge, 1F, #3 International Convention Center Pamir
Minutes:
Nova BPs
entities are "service" (representing nova-compute), "service" (VM, Instance) and physical machine
No notication when service-mark-down and vm-rest-state (reset state already exists in Nova)
Related BPs:
get valid VM state - may have discussion on friday
notificatin when force-down or disabled - no session… - should be in Mitaka
notification when server state has been reset - we have to draft and find asignee
add new parms in reset server state
API
versioned nortification - 2:40pm - 3:20pm on Thursday ←– blocker for balaz's BP
make mark-service-down to allow mark affected VMs down as option of that
API
3 levels of computing failure:
if nova-compute service goes down, VNF will likely continue running without any service disruption. Thus, VIM may not need to notify VNFM as the VNF instance is not being affected
Although the VNFM may not need to be notified, we need to notify the NFVO.
In this case in specific, there's a clear seperation of Consumer: VNFM and NFVO(/VIM administrator)
auto-reaction
vim should not execute auto reaction based on policy in server metadata specified by user, or it should?
use case should be discussed and described in doctor document
Neutron BP
Cinder BP
Alarming / Aodh etc.
Monasca, Congress, Vitrage
Doctor + Congress meeting, Wed 28 Oct 15:40-16:20, room?)
OPNFV Summit
PoC Demo at docomo booth
Design summit topics (design summit running Mon-Tue, 9-10 Nov)
Oct 20, 2015
Oct 13, 2015
Oct 6, 2015
Sep 29, 2015
Sep 22, 2015
Sep 15, 2015
Sep 8, 2015
Sep 1, 2015
Aug 25, 2015
Aug 18, 2015
Aug 11, 2015
Aug 4, 2015
July 30, 2015
July 28, 2015
July 21, 2015
July 14, 2015
July 7, 2015
June 30, 2015
June 23, 2015
Agenda:
BP Status
Deliverable status
AoB
Minutes:
BP Status Nova
BP Status Ceilometer
Deliverable status
AoB
IRC Log:
22:00:08 r-mibu > #startmeeting doctor
22:00:40 GeraldK > #info Gerald
22:00:46 tojuvone > #info Tomi Juvonen
22:02:03 r-mibu > #startmeeting doctor
22:02:10 r-mibu > #endmeeting
22:04:05 r-mibu > #link https://etherpad.opnfv.org/p/doctor_meetings
22:04:31 cgoncalves > #info Carlos Goncalves
22:04:37 ildikov > #info Ildiko Vancsa
22:04:55 bertys > #info Bertrand Souville
22:07:13 r-mibu > #topic BP Status Nova
22:08:42 GeraldK > #info Roman (intel) is back and will continue implementation
22:09:35 GeraldK > #info Carlos is facing some issues with proposed patch (conflict in gerrit)
22:11:06 bertys > #link https://review.openstack.org/#/c/185849/
22:13:52 r-mibu > #topic BP Status Ceilometer
22:14:05 r-mibu > #link https://review.openstack.org/#/c/172893/
22:14:23 GeraldK > #info spec got approved by 3 core reviewers
22:14:43 GeraldK > #info 2 more developers from Intel working on it
22:15:00 GeraldK > #info in total 3 developers working on this spec
22:16:14 ildikov > #link https://review.openstack.org/192684
22:17:04 ildikov > #link https://review.openstack.org/192688
22:18:57 r-mibu > #topic Deliverable status
22:19:26 GeraldK > #link http://artifacts.opnfv.org/
22:19:27 r-mibu > #info tagged '2015.1.0'
22:22:00 GeraldK > #info working on open issues (review comments). pls join discussion in Gerrit.
22:26:37 GeraldK > #info proposal to discuss open comments in Doctor meeting
22:27:23 r-mibu > #topic AoB
22:27:24 GeraldK > #topic AOB
22:28:05 r-mibu > #info Ceilometer *virtual* mid-cycle
22:28:26 GeraldK > #info Ceilometer F2F midcylce event is canceled, there will be virtual discussion, e.g. IRC
22:28:35 r-mibu > #link https://etherpad.openstack.org/p/ceilometer-liberty-midcycle
22:29:16 ildikov > #link http://doodle.com/6vfksdu38wcwqqd3
22:29:17 GeraldK > #info dates are not yet fixed. doodle vote.
22:29:34 ildikov > #info Ceilometer has a virtual mid-cycle as opposed to face-to-face
22:32:49 r-mibu > #info OPNFV Hackfest
22:33:39 GeraldK > #action Carlos to prepare demo script for demo
22:37:19 GeraldK > #info Ildiko and Gerald propose to have a session on Doctor, where demo is part of, but also show status of BPs/specs and way forward
22:38:01 r-mibu > #topic req-doc discussion
22:38:13 r-mibu > #info fencing
22:39:01 r-mibu > #link http://artifacts.opnfv.org/doctor/html/03-architecture.html
22:41:14 cgoncalves > #link https://gerrit.opnfv.org/gerrit/#/c/882/
22:41:47 GeraldK > #info Fencing gap had been part of earlier version of Doctor document, but was removed
22:41:59 r-mibu > #info fencing is one of external system responsibilities (when the host mark down)
22:42:19 GeraldK > #info gerrit patch is proposing to add this feature to Doctor
22:43:49 GeraldK > #info Ildiko: there is discussion on whether Nova or external tool is responsible for fencing. should this be part of Doctor project?
22:46:25 GeraldK > #info Ryota: okay to not have gap on fencing, but mention this feature in the architecture section
22:47:23 r-mibu > #info agreed
22:47:39 GeraldK > #agree mention fencing in "general features", but do not have gap on it
22:47:58 r-mibu > #info Maintenance state
22:48:06 r-mibu > Change: ("going to maintenance" and "in maintenance")
22:50:11 r-mibu > #info DOCTOR-11
22:51:21 GeraldK > #info DOCTOR-11 has wider scope. offline review needed.
22:51:55 GeraldK > #agree agreement to have two maintenance states ("going-to-maintenance" and "in-maintenance")
22:52:06 r-mibu > #info User can stop maintenance
22:53:09 GeraldK > #info ...or user does not respond to maintenance request
22:54:34 GeraldK > #info error cases needed, e.g. resend maintenance request after timeout
22:54:49 GeraldK > #info how to handle cases where a user is sending NACK?
22:55:09 r-mibu > #info or having error/force policy
22:56:15 GeraldK > #info give responsibility back to Administrator
June 16, 2015
June 9, 2015
June 2, 2015
May 26, 2015
May 19, 2015
May 12, 2015
Agenda:
Participants: Gerald Kunzmann, Ryota Mibu, Carlos Goncalves, Bryan Sullivan, Adi Molkho, Dan Druta, Michael Godley, Maryam Tahhan, Tomi Juvonen, Tommy Lindgren, Gurpreet Singh
Minutes:
May 5, 2015
April 28, 2015
Joint meeting with ETSI NFV REL team.
Agenda:
Identify Purpose of the call
NFV REL:
OPNFV Doctor:
Project Overview
Use cases
Collaboration methodology discussion
Wrap-up
Minutes:
Purpose
Ryota: know each other; see how to work together; further technology discussion needed at later stage
Markus Schoeller (NEC): no IPR declarations today, today only exchange of public information
policies how to work together w.r.t IPR etc should be defined for later work
Gurpreet: high-level of Doctor project; fault-detection and management; what are use cases of Doctor?
NFV REL introduction (Markus Schoeller)
Project overview: see ETSI/NFVREL(14)000200)
dedicated reliability project
Ryota: target size / number of applications?
Tommy: which work items focus on VIM part? indirectly addressed in monitoring and failure detection. scalabilty per se has some impact on VIM
Tommy: this means "monitoring and failure detection" would be the main crossing point with Doctor? so far yes, but in next meeting new WIs may be created
NFV software upgrade mechanism (Stefan Arntzen - Huawei)
different to traditional upgrades: "old traffic" can still go to "old software version", whereas new traffic/connections can go to the new s/w version in parallel (this is enabled by virtualization); no hard switchover needed; old system/version is still running and it can be switched back in case of issues with the new version
assumption is that this can be done stateless (otherwise it would be more complex)
Active monitoring for NFV (Gurpreet)
Alistair Scott: interested in passive monitoring; where as attachment points for passive monitoring? REL has not looked in passive monitoring for NFV
Gurpreet: identify use cases where current implementation has gaps
OPNFV Doctor
Stefan: plan to use OpenStack components?
Ryota: we are not only focusing OpenStack, but open source in general
Tommy: but OpenStack is the primary s/w used in OPNFV
Gurpreet: work flow for upstream community?
Ryota: define requirements, gap analysis, provide blueprints, but no coding in Doctor project
Next action:
April 21, 2015
Agenda:
Deliverable
Structure: uploaded to Gerrit and split into multiple files; need consensus from community
Propose requirement project deliverable template based on Doctor's (WIP: Carlos, Ryota, Ikdiko)
Review comments received so far
Blueprints
Minutes:
Status of BPs
Deliverable
We still have review comments which are not reflected to doc yet
RST files has been splited, the format would be template for other requirement projects
how we can publish …
-
-
action(doctor): describe framework and inspector
API
Logistics
Next meeting
April 14, 2015
Agenda:
Status of BPs
Doctor requirement deliverable
-
Minutes:
Status of BPs
Doctor requirement deliverable
API in between OpenStack and HW/NFVI monitiring module e.g. Zabbix (Ryota)
Change "southbound
API" to
API in between OStack and HW/NFVI monitoring module
April 7, 2015
Agenda:
Input from Swfastpathmetrics team
Status of BPs
Minutes:
Input from Swfastpathmetrics team
-
Revisit the scope of the NIC to makes sure that we can collect VF stats.
Can the NIC report VF/PF stats capabilities? Investigate: Maryam
I’ve been looking into this for Intel® 82599 10 GbE Controller, and this might be possible through a level of indirection by checking what VFs are enabled. It’s not exactly what’s being asked, but if you know what knew a VF was enabled then you’d know what stats are also available.
BTW: Stats can then be retrieved then per VF for Niantic:
VF Good Packets Received Count
VF Good Packets Transmitted Count
VF Good Octets Received Count Low
VF Good Octets Received Count High
VF Good Octets Transmitted Count
VF Good Octets Transmitted Count
VF Multicast Packets Received Count
But then error stats are still shared.
Open Maryam is looking into is if we knew the Queues that were assigned to a VF could we use Queue Packets Received Drop Count (QPRDC) to retrieve the dropped packets for a VF?
Maryam in the process of writing a DPDK app that runs as a secondary process on the host and is capable of reading the stats, which can then be parsed by a script.
No Southbound interface for Doctor defined yet.
Action: Ryota to draft SB
API of Doctor
Status of BPs
March 31, 2015
Agenda:
Status of requirement deliverable
Status and next steps of BPs (Tomi, Ryota)
Nova BP review
Input from Swfastpathmetrics team
Minutes:
Status of requirement deliverable: Distributed to OPNFV community
Discussion about leveraging OpenStack Zaqar as multi-tenant messaging system for real-time event notifications
HTTP vs SNMP
Input from Swfastpathmetrics team:
March 24, 2015
Participants: Ryota, Tomi, Bertrand, Gerald
Agenda:
Status and next steps of BPs (Tomi, Ryota)
Nova BP review
Input from Swfastpathmetrics team
Status of requirement deliverable
Document Review
Minutes:
Recap of last weeks BP meeting (Thursday)
Ryota had presented; request was to use the template; more details needed, but general approach is okay
Bryan has raised issue that more discussion on notification on NB I/F interface is needed
Proposal of Doctor should be aligned with other projects
Ashiq: does anyone have experience in writing BPs → Tomi: using the template it was straightforward
proposal to find someone with experience. Ryota already has some experience.
Ryota: problem is who could review our BPs. we need to socialize with community. Ryota has some channels he can use for this.
it is not clear if we need TSC approval for submitting the BPs; at least we should align at OPNFV level
Ashiq: it is important people join the meetings, e.g. the BP meeting, the individual BP meetings, socialize the BPs in the OPNFV community (by discussing it on the mailing list)
Tomi: proposal to send mail to tech-discuss asking for any objections within few days, then upload
Ashiq: already upload the BPs and revise it in case needed
Status and next steps of BPs (Tomi, Ryota)
Nova BP review
Input from Swfastpathmetrics team
Status of requirement deliverable
draft distributed yesterday.
ACTION: perform Doctor internal review by end of this week.
Stable draft will be provided to two weeks OPNFV-wide review on Monday March 30th
Ashiq: architecture with 4 building blocks. Inspector filters some fault information. In Notifier is a policy to also filter which fault informations to send or not.
why do you enable the filtering in both Inspector and Notifier? Ryota: different kind of filters. in OpenStack all alarms from Controller will be emitted, thus we need to have policy to filter.
Ashiq: Inspector is filtering physical faults. notifier is filtering faults on virtual resource level. Correct?
Document Review
Others:
March 17, 2015
March 5, 2015
March 5, 2015
Ad-hoc meeting for blueprint planning
Agenda:
Minutes:
March 3, 2015
Feb 24, 2015
Requirement project round table @ Prague Hackfest
Participants: Ryota (NEC), Gerald (DOCOMO), Bertrand (DOCOMO), Ashiq (DOCOMO), Tomi (Nokia), Tommy (Ericsson), Carlos (NEC), Gianluca Verin (Athonet) Daniele Munaretto (Athonet), Sharon (ConteXtream), Christopher (Dorado Software), Russell (Red Hat), Frank Baudin (Qosmos), Chaoyi (Huawei), Al Morton (AT&T), Xiaolong (Orange), (Oracle), Randy Levensalor (CableLabs) …
Slides can be found here: https://wiki.opnfv.org/_media/doctor/opnfv_doctor_prague_hackfest_20150224.n.pptx
Minutes:
Use case 1 "Fault management"
Main interest: northbound I/F
Reaction of VNFM is out of scope
VM (compute resouces) is the first focus, storage and network resources will follow at later stage
Fault monitoring: plugable architecture is needed to catch different (critical) faults in NFVI to enable use of different monitoring tools. Predictor (fault prediction) may also be one input.
4 functional blocks:
controller (e.g. Nova), monitor (e.g. Nagios, Zabbix), notifier (e.g. re-use Ceilometer), inspector (fault aggregation etc)
VM state in resource map, e.g. "fault", "recovery", "maintenance" (more than just a heartbeat)
Question of whether other OpenStack components (e.g. Cinder, Glance, etc) can report events/faults
What is the timescale to receive such fault notification? this would be helpful for the motivation in the blueprints. Telco nodes: i.e. less than 1s, switch to ACT-SBY as soon as possible.
Preference is event based events, not polling. should be configurable.
Telco use case would have few hundreds of nodes, not thousands of nodes.
Demo 1 (using default Ceilometer) takes approximately 65 seconds to notify the fault (90 seconds total including spawning new VM), while demo 2 only takes ⇐ 1 second (26 seconds total)
Pacemaker is running at application layer; different scope.
Feb 23, 2015
Doctor/Fastpathmetrics/HA Cross Project Meeting @ Prague Hackfest
Goal:
Minutes:
Project Intro:
Doctor:
-
-
SW fastpath implementation → interfaces to DPDK
consume statistics and counters
information goes to VNF, it does not go to OpenStack (in that sense there is no overlap)
-
quick response. time-critical.
very short response for fault detection and resolve
people from HA project really work on HA today, have a lot of knowledge on it
Identify Overlap:
NB I/F
Doctor is also requiring fast reaction. objective with HA is similar.
HA has more use cases and may send more information on the northbound I/F. VNFM should be informed about changes.
Doctor objective is to design a NB I/F.
Does HA already have flows available?
HA is focusing on application level. Reaction should be as fast as possible. Including the VNFM may slow down the progress.
In Doctor we will follow the path through VNFM.
In ETSI we have lifecycle mgmt, where VNFM is responsible for the lifecylce
There are certain information the VNFM doesn't know about. In Doctor we call it "consumer".
Proposal to do use case analysis for HA. Which use cases may require the VNFM to be involved? "Doctors" will have a look at HA use cases.
How is the entity to resolve race conditions? Some entity in the "north".
What about a shared fault collection/detection entity instead of collecting the same information 3 times?
Security issues are not addressed in Doctor. Currently assuming a single operator, where policies ensure who can talk to who.
In Doctor we do not look at application faults, only NFVI faults.
Huawei: we use Heat to do HA. if one VM died and Heat will find Scaling Group less than 2, it will start a new VM. This may take more than 60s, we need to find something faster for HA. Heat doesn't find error in the applications.
Failure detection time is an issue across all projects.
Which metrics of fastpath would Doctor be interested in? need to check in detail. Action Item to send metrics to Doctor.
Hypervisor may detect failure of VM and take action.
Doctor: if VIM takes action on its own it may conflict the ACT-SBY configuration at the consumer side. this is why the consumer should be involved.
Which project would address ping-pong issue that may arise?
We need subscription mechanism including filter (which alarms to be notified about). Mapping VM-PM-VNFM can be recorded during the instantiation.
Relationship between Doctor and Copper:
policy defines e.g. when VIM can expose its interface
When to inform a fault, whom to inform etc is all a kind of policy.
Copper has both pro-active and reactive deployment of policies. In reactive case, there may be conflict when both Copper and Doctor receive the policies.
Wrapup:
Overlap in fault management
FastPath: monitor traffic metrics; Doctor will need some of the metrics in the VIM. plan to do regular meetings.
HA: large project with wider scope than Doctor, different use cases. direct flow (to be faster). task to check each others NB I/F in order not to block each other.
Feb 17, 2015
Agenda:
Minutes:
Participants: Ryota Mibu, Khan Ashiq, Gerald Kunzmann, Carlos Goncalves, Susana, Thinh Nguyenphu, Tommy Lindgren, Bryan Sullivan, Bertrand Souville, Michael Godley, Manuel Rebellon, Uli Kleber
Hackfest
Document status
HA and fault prediction project and "Software FastPath Service Quality Metric" project
Feb 10, 2015
Agenda:
Minutes:
OPNFV should be careful with tools projects use and distribute as part of the platform due to their licensing
Framework should be modular enough to be pluggable with multiple monitoring solutions
Editors for each first deliverable section were assigned
Gap analysis to be further extended
Section editors should have an initial draft ready by Feb 18
Deliverable editors (Gerald and Ashiq) will have Feb 19-20 to compile everything together for the Prague Hackfest
Feb 6, 2015
Extra meeting for Implementation Planning
Agenda & Minutes:
Implementation Planning
Topic and agreement can be found in
Slides.
Feb 2, 2015
Agenda:
Timeline - Ryota
This proposal has close relation: data collection of failure prediction
Praque meetup time.
Implementation plan: review comments by Tomi in last week's minutes
Wiki Updates to follow BGS format - Ryota
Doctor team participation in the OpenStack Summit Vancouver - Carlos
Weekly meeting time - Ryota ?
Minutes:
Participants: Carlos Goncalves, Don Clarke, Ryota Minu, Tomi Juvonen, Yifei Xue, Al Morton, Bertrand Souville, Gerald Kunzmann, Manuel Rebellon, Ojus K. Parikh, Ashiq Khan, Pasi, Paul French, Charlie Hale,
Ryota presents a refreshed Timeline
-
Initial draft of requirement document should be ready before the Hackfest 23-24 Feb in Prague
-
Target architecture is OpenStack; Implementation plan is on how this will be realized in upstream projects, e.g. interfaces.
Predictor project:
Implementation plan:
for evacuation we should stay implementation independent, not OpenDayLight or Neutron (they may use it in the actual testbed, but we should restrict Doctor to the interfaces definition)
it is not intended to use Ceilometer, but a similar service.
it is necessary to be able to isolate a faulty machine, such that new VMs are not started on this machine.
different ways/workflows for recovery; we should start by implementing a few sample workflows
e.g. switch to active hot standby VM, then instantiate a new hot standby instance (this is a Doctor requirement)
evacuation (if time allows) vs active hot standby (immediate action)
VNFM is deciding about the best action (this is out of scope of Doctor; Doctor only specifies NB I/F)
we need to get into more details for this plan. discussion should go via email to make progress before next meeting
Hackfest
Take to the hackfest what we have, i.e. if we "only" have one implementation plan so far let's use this.
-
Doctor is planned for Tuesday. Also other requirement projects will be discussed on Tuesday.
Ryota did cleanup of Doctor Wiki page
Doctor team participation in the OpenStack Summit Vancouver?
Meeting time → via email
Jan 26, 2015
Agenda:
Minutes:
Timeline milestone planning
Soft schedule for Fault Table, set 1 milestone end of Jan
Requirement Document should be finished by Mar 15 ? - No
Set some milestone on Hackfest at Prague
TODO(Ryota): create wiki page
Discuss maintenance use case - Tommy
Implementation outside Nova - Tomi
Jan 19, 2015
Jan 12, 2015
Dec 22, 2014
Agenda:
work item updates
Fault table
GAP analysis template
Wiki pages
Minutes:
Dec 15, 2014
Dec 8, 2014
Agenda:
How we shape requirements
Day of the week and time of weekly meeting
Tools: etherpad, ML,
IRC?
Project schedule, visualiztion of deliverables
Minutes:
How we shape requirements
Use case study first
Gap Analysis should be included existing monitoring tools like Nagios etc.
How we format fault message and VNFD elements for alarms?
Fault detection should be designed within a common/standard manner
Those could be implement in existing monitoring tools separated from OpenStack
What is "common" monitoring tools, there are different tools and configurations
Focus on H/W faults
Do we really need that kind of notification mechanism? Can we use error from
API polling, just get error detected by application or auto-healing by VIM?
Real vEPC needs to know fault that cannot be found by application like abnormal temperature.
VIM should not run auto-healing for some VNF.
There are two cases/sequences defined in ESTI NFV MANO that fault notification are send from VIM to VNFM and to Orchestrator.
Alarming mechanism is good to reduce the number of request from user who pooling virtual resource status.
We shall categorize requirements and create new table on wiki page. (layer?)
→ A general view of the participants is to have the 'HW monitoring module' outside of OpenStack
TODOs
Open etherpad page for collaborative working (Ryota)
Collect use cases for different fault management scenarios (Ryota)
-
Provide Gap Analysis (Dinesh, Everyone)
Provide fault management scenario based on ETSI NFV Architecture (Ashiq)
List fault items to be detected (Ashiq, Everyone)
Day of the week and time of weekly meeting
Tools: etherpad, ML,
IRC?
Project schedule, visualiztion of deliverables
Dec 1, 2014
Agenda:
Minutes:
Project proposal
There were two comments at project review in TSC meeting (Nov 26)
Large scope: TSC asked to narrow project scope, creating as requirement projects seems reasonable
Overlap with HA project, so collaborate in project activity
-
Ashiq and Qiao had talked before this meeting, and agreed that we would not eliminate duplication at proposal phase
Project proposal was fixed by some members
-
The project categories was changed to requirement only
In new revision of project proposal, we removed detailed descriptions which don't suit requirement project
Links to original project proposal are replaced to point the new page, and the link to the old page that described further details can be found at the bottom of the new proposal page
We should not edit the proposal page after TSC approval to keep evidence what we planed at the beginning of the project
"Auto recovery" is missing, will continue discussion in mail with clarification by Tomi
Nov 17, 2014
Agenda:
Scoping and Scheduling (what feature to be realized in what time frame)
Resources available and necessary for this project
Technical aspects and relevance to upstream projects
How to socialize with upstream projects
Minutes