This is an old revision of the document!

Doctor Team Meetings

Info

Agenda of the next meeting: https://etherpad.opnfv.org/p/doctor_meetings
Note: Feel free to add your topic(s).
When: Every Tuesday 6:00-7:00 PT (14:00-15:00 UTC) Your local time

Call Logistics: GoToMeeting
- Web Access
  - https://global.gotomeeting.com/join/409154429
- Phone
  - Meeting ID / Access Code: 409-154-429
  - Audio PIN: (Shown after joining the meeting)
  - Dial in number:
    - Canada: +1 (647) 497-9351
    - France: +33 (0) 182 880 459
    - United Kingdom: +44 (0) 330 221 0086
    - Australia: +61 2 8355 1024
    - Netherlands: +31 (0) 208 080 381
    - New Zealand: +64 (0) 4 974 7214
    - Denmark: +45 (0) 69 91 88 64
    - Italy: +39 0 553 98 95 67
    - United States: +1 (215) 383-1010
    - Austria: +43 (0) 7 2088 1403
    - Belgium: +32 (0) 38 08 1856
    - Sweden: +46 (0) 852 503 499
    - Germany: +49 (0) 692 5736 7210
    - Switzerland: +41 (0) 435 0167 09
    - Finland: +358 (0) 942 41 5780
    - Spain: +34 955 32 0845
    - Ireland: +353 (0) 14 845 978
    - Norway: +47 21 03 58 98

Past meetings

Weekly project meetings

Jan 19, 2015
- Agenda
  - Review of timeline of Doctor project
  - List of tasks
- Minutes
  - Review of timeline of Doctor project
    - https://wiki.opnfv.org/_media/requirements_projects/timeline_doctor_r1.png
    - Milestone for First Spec ready might be put on OPNFV hackfest in Prague
    - Submit blueprints to Kilo has been Postponed
    - Action: Ryota to revise timeline table
  - Gap Analysis.
    - Third gap has been added to the etherpad (Fencing instances of an unreachable host in Nova) https://etherpad.opnfv.org/p/doctor_gap_analysis
    - Please all review / add new gaps to the etherpad
  - Implementation Plan
    - Discussion last week on good candidate monitoring projects for Doctor project
    - Need also to work on implementation plan / architecture.
    - Action: Tomi (Nokia) to come up with high level architecture proposal
  - List of tasks
    - Task 1 (Ryota)
      - Timeline milestone planning
    - Task 2 (Gerald)
      - Fault Table
    - Task 3 # punt
      - Gap Analysis
    - Task 4 (Tomi)
      - High level arch
      - Out-side of Nova arch first, Nova
      - One view for this here: http://blog.russellbryant.net/2014/10/15/openstack-instance-ha-proposal/
      - Here is a picture for discussion to ipmlement outside of Nova: https://wiki.opnfv.org/_media/doctor-opnfv-proposal.pptx
    - Task 5 (Ryota)
      - Wiki update (meeting minuts and so on)
    - Task 6 # TBD
      - NB I/Fs
      - Messaging
      - Information elements
      - Impl. plan(Gathering faults)

Jan 12, 2015
- Agenda
- Minutes
  - Fault table: https://wiki.opnfv.org/doctor/faults
    - Action: check and revise / update / extend
    - Some faults are specific to a certain HW other are more general
    - We should try to come up with a high level description of common faults
    - Proposal not to go to such level of detail.
    - Keep one fault table and use the current fault table for study of the scope of Doctor
    - Are there other faults that cannot be detected by SNMP and Zabbix_agent?
    - We need a tool in Doctor that can retrieve such alarms. Should this tool be integrated with OpenStack or be independent? Should be kept open;
      - different tools may be used given clear interfaces are specified
      - Monasca: https://wiki.openstack.org/wiki/Monasca OpenStack project for monitoring of faults; still in its early days
  - ETSI meeting in Prague: proposal to meet there
    - https://etherpad.opnfv.org/p/HackfestFeb23
    - Action: update your participation on that EtherPad page
  - Doctor "working page": https://etherpad.opnfv.org/p/doctor_gap_analysis
    - Action: edit this page for ongoing work on the gap analysis
  - Doctor wiki page updated
    - Use cases moved to extra page: https://wiki.opnfv.org/doctor/use_cases
  - Timeplan:
    - Action: Ryota to prepare a timeplan/timeline
    - Timeplan can be checked in each week's meeting
      - seems we are behind initial time plan (see https://wiki.opnfv.org/_media/doctor_141208.pdf slide 11)
    - Reminder: some documents should be available by March
  - Next meeting: Jan 19th

Dec 22, 2014
- Agenda
  - work item updates
  - Fault table
  - GAP analysis template
  - Wiki pages
- Minutes
  - Work item updates
    - Fault table
      - Status: waiting Palani's initial commit
      - Tomi also made initial list of faults.
      - TODO(Tomi): Open new wiki page to share the fault list
    - GAP analysis template
      - Ryota explained GAP analysis template drafted on https://etherpad.opnfv.org/p/doctor_requirement
      - TODO(Ryota): Open new etherpad page for Gap Analysis with the template
    - Wiki pages
      - Our plan for wiki/doc structure seems to be OK, cause there was no question and objection in the past week.
      - TODO(Ryota): Update wiki pages
  - Fault notification at the Northbound I/F
    - Critical faults
      - It was agreed that we should characterize faults as critical ornon-critical when reporting to VNFs.
      - We must report all critical faults northbound. We may report some of the non-critical faults, need further study.
    - Fault aggregation
      - Discussed whether toaggregate different alarms and faults before notifying via northbound interfaceto VNFs.
      - General agreement that there should be some level of aggregation, butneed to figure out what events need to be aggregated.
      - Some suggested that VNFs should be notified only if the faults are urgent.
    - Notifying data center operations folks about hardware faults is something that seems to be out of scope for this project. Tomi: I think they need the information and there should not be a duplicate mechanism to detect faults to be able to make HW maintenance operations. Surely they will not need the notification that we would send to VNFM, but the actual alarm information we are gathering to make those notifications. Anyhow I agree that this is not in our scope and tools like Zabbix that we could use here can easily be configured then for this also in case HW owner is interested.
    - Why should warnings be sent to VNFs (such as cpu temp rising but notcritical yet)? VNFs might want to take action such as setup/sync hot standbyand this could take some time.
  - Are there open souce projects already to detect hypervisor or host OS faults?
    - OpenStackNova devs said it should be kept simple, providers need to monitor processes ontheir own.
    - But there appears to be some open source tools(SNMP polling or SNMP agents on host). Need to pull things together.
  - Next call will be on January 12th.

Dec 15, 2014
- Agenda
- Minutes
  - wiki/doc structure
    - Agreed to have three sections
      - UseCase (High-level description)
      - Requirement (Detail description, GAP Analysis)
      - Implementation(includes monitor tools and alternatives)
  - Faults table
    - will create table that explain stories for each fault
    - columns would be physical fault, how to detect, effected virtual resource and actions to recover
    - in three categories Compute, Network and Storage, will start on Compute first
    - also try to keep separate table/categories for critical and warning
    - TODO(Palani): provide fault table example
    - TODO(Gerald): create first version of fault table after getting table example
  - framework
    - how we handle combination of faults and feature H/W faults that is still open question
    - suggestion to have fault management "framework" that should be configurable to define faults by developers or operators
  - Gap analysis
    - We should have list of items so that we can avoid duplicated work
    - TODO(Ryota): Post first item to show example how we describe that could be template for GAP analysis
  - Monitoring
    - We should check monitoring tools as well: Nagios, Ganglia, Zabbix
  - Check TODOs from the last meeting
    - seems almost all items have done or started (but we could not check 'fault management scenario based on ETSI NFV Architecture' although there is silde on wiki)
  - Next meetings
    - Dec 22, 2014
    - Jan 12, 2015 # skip Jan 5th

Dec 8, 2014
- Agenda
  - How we shape requirements
  - Day of the week and time of weekly meeting
  - Tools: etherpad, ML, IRC?
  - Project schedule, visualiztion of deliverables
- Minutes
  - How we shape requirements
    - Use case study first
    - Gap Analysis should be included existing monitoring tools like Nagios etc.
    - How we format fault message and VNFD elements for alarms?
    - Fault detection should be designed within a common/standard manner
    - Those could be implement in existing monitoring tools separated from OpenStack
    - What is "common" monitoring tools, there are different tools and configurations
    - Focus on H/W faults
    - Do we really need that kind of notification mechanism? Can we use error from API polling, just get error detected by application or auto-healing by VIM?
      - Real vEPC needs to know fault that cannot be found by application like abnormal temperature.
      - VIM should not run auto-healing for some VNF.
      - There are two cases/sequences defined in ESTI NFV MANO that fault notification are send from VIM to VNFM and to Orchestrator.
      - Alarming mechanism is good to reduce the number of request from user who pooling virtual resource status.
    - We shall categorize requirements and create new table on wiki page. (layer?)
    - → A general view of the participants is to have the 'HW monitoring module' outside of OpenStack
    - TODOs
      - Open etherpad page for collaborative working (Ryota)
      - Collect use cases for different fault management scenarios (Ryota)
      - Set IRC (Carlos)
      - Provide Gap Analysis (Dinesh, Everyone)
      - Provide fault management scenario based on ETSI NFV Architecture (Ashiq)
      - List fault items to be detected (Ashiq, Everyone)
  - Day of the week and time of weekly meeting
    - Monday, 6:00-7:00 PT (14:00-15:00 UTC)
    - TODO(Ryota): create weekly meeting entry in GoToMeeting
  - Tools: etherpad, ML, IRC?
    - We will use opnfv-tech-discuss ML with "[doctor]" tag in a subject.
    - We will use "opnfv-doctor" IRC channel on chat.freenode.net .
    - TODO(Carlos): update wiki
  - Project schedule, visualiztion of deliverables
    - All team members are asked to check project proposal page and slides that are approved by TSC and show our schedule and deliverables.
    - Northbound I/F first specification by Dec 2014.

Dec 1, 2014
- Logistics
  - Time: 14:00-15:00 UTC Find your local time
  - Gotomeeting: https://global.gotomeeting.com/join/559378909
    - Dial +1 (646) 749-3122
    - Access Code: 559-378-909
    - Audio PIN: Shown after joining the meeting
    - Meeting ID: 559-378-909
- Agenda
  - Project Proposal Refinement
    - https://wiki.opnfv.org/doctor/project_proposal
    - Project Categories
    - Deliverables and Schedule - SR
- Minutes
  - Project proposal
    - There were two comments at project review in TSC meeting (Nov 26)
      - Large scope: TSC asked to narrow project scope, creating as requirement projects seems reasonable
      - Overlap with HA project, so collaborate in project activity
      - https://wiki.opnfv.org/requirements_projects/high_availability_for_opnfv
    - Ashiq and Qiao had talked before this meeting, and agreed that we would not eliminate duplication at proposal phase
    - Project proposal was fixed by some members
      - https://wiki.opnfv.org/doctor/project_proposal
      - The project categories was changed to requirement only
      - In new revision of project proposal, we removed detailed descriptions which don't suit requirement project
      - Links to original project proposal are replaced to point the new page, and the link to the old page that described further details can be found at the bottom of the new proposal page
      - We should not edit the proposal page after TSC approval to keep evidence what we planed at the beginning of the project
      - "Auto recovery" is missing, will continue discussion in mail with clarification by Tomi

Nov 17, 2014
- Agenda
  1. Scoping and Scheduling (what feature to be realized in what time frame)
  2. Resources available and necessary for this project
  3. Technical aspects and relevance to upstream projects
  4. How to socialize with upstream projects
- Logistics
  - Time: 14:00-15:00 UTC Find your local time
  - GoToMeeting https://global.gotomeeting.com/join/801642893
  - Phone bridge: +1 (408) 650-3112 (Access Code: 801-642-893, Audio PIN: Shown after joining the session)
- Minutes

Extra meetings

Feb 6, 2015
- Implementation Planning
  - Topic and agreement can be found in Slides.

Wiki

User Tools

Site Tools

Table of Contents

Doctor Team Meetings

Info

Past meetings

Weekly project meetings

Extra meetings

Page Tools