Ticket #266 (assigned defect)

Opened 16 months ago

Last modified 3 months ago

AMF: Wrong behavior of N-way active redundancy model

Reported by: marioa Owned by: marioa
Priority: critical Milestone: 3.0.0-GA
Component: AvSv Version: 3.0.0-FC
Keywords: Cc:
patch waiting for maintainer: no

Description

See background information:
http://list.opensaf.org/pipermail/users/2008-September/001492.html

According to AMF spec the characteristics of the N-way redundancy model is to: "At any given time, the Availability Management Framework should make sure that the redundancy level (the preferred number of active assignments) for each SI is guaranteed, if possible, while the maximum number of service units is not exceeded." (AMF Chapter 3.7.5.1 bullet 6).

The goal is unquestionable, the remaining question is then if it is possible to assign HA-state=ACTIVE to the service units at node nine and ten (see background information in mail thread)

According to AMF spec (chapter 3.3.2.3), the "readiness state" of a component indicates whether a component is ready to take service instance assignments. When a component's readiness state is In-service it is eligible for CSI assignments. The components readiness state is in-service if its operational state is enabled and and the readiness state of the SU containing it is in-service.

According to AMF spec (chapter 3.3.1.4), the readiness state of an SU is In-service if its operational state and and the operational state of its containing node is enabled its administrative state and the administrative state of its containing service group, node and cluster are unlocked its presence state is either instantiated or restarting

The log records that we have show that the SUs at both node nine and ten have readiness state In-service, which means there shall be no hinder for AMF to assign CSIs with HA-state=Active to the components of these SUs.

The problem has been explained (i mail thread) as a consequence of the SG is not being "in a stable state". We can not find anything stated in the AMF specification that the SG has to be in a stable state before SUs can be assigned.

Possible view on the problem is that AMF detects that a csiSet(QUIESCED) operation has timed out on node 5 and AMF has detected this and tried to recover. During the recovery CLEAN UP has been done successfully and then an attempt to INSTANTIATE the component has been done. INSTANTIATION has failed however leaving the component in the INSTANTIATION-FAILED state. (The mistake is perhaps that the successful CLEANUP has not been internally reported to the SG so that the SI-state of the SU could be removed.)

Attachments

Change History

Changed 9 months ago by marioa

  • milestone changed from Release3 to 3.0.0-RC1

Milestone Release3 deleted

Changed 9 months ago by nagendra

  • owner changed from nagendra.kumar@… to nagendra
  • status changed from new to accepted

Changed 8 months ago by marioa

  • priority changed from major to critical
  • version changed from 2.0.0 to 3.0.0-FC

Changed 3 months ago by marioa

  • owner changed from nagendra to marioa
  • status changed from accepted to assigned

Add/Change #266 (AMF: Wrong behavior of N-way active redundancy model)

Author



Action
as assigned
Note: See TracTickets for help on using tickets.