Disclaimer : All the postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

 

Disk level validation for LPM of NPIV LPAR‘ is a key enhancement to ‘Live Partition Mobility(LPM) validation‘ and is being offered in this year(2015)’s PowerVM 2.2.4 release. This article talks about it in detail.


‘Live Partition Mobility'(LPM) is one of the well-celebrated features of PowerVM from a very long time. It provides the ability to move a active partition(LPAR) from one physical IBM Power Systems server to another without any workload/application downtime. There are multiple enhancements in PowerVM 2.2.4 release to improve the performance, resiliency, flexibility and validation of LPM. This IBM developerWorks announcement page ( https://www.ibm.com/developerworks/community/wikis/home?lang=en_us#!/wiki/Power%20Systems/page/Key%20functionality%20included%20in%20PowerVM%202.2.4 ) lists out all the enhancements, including the ones for LPM.

This article is focused on providing the details of this new feature which enables better NPIV storage validation.
IBM Knowledge Center page which provides information about this is located here : http://www-01.ibm.com/support/knowledgecenter/8247-21L/p8hc3/p8hc3_npivorlunval.htm

NOTE : Throughout this article I’ll use the term ‘NPIV LPAR‘ to refer to the Logical Partition(LPAR) whose storage has been provisioned through virtual FC adapter.

Prior to this feature, validation for LPM of NPIV LPAR was limited to checking ‘FC zoning’ i.e. it checked if the destination server had connectivity to the same storage-ports as was connected to the source server. Now with this feature in place, validation could be used to “additionally” check ‘LUN masking’ i.e. if the same disks are connected to the migrating partition on the destination server as was on the source server. In this article i’ll try to de-jargonize the feature using certain well explained figures.

Let’s start with the detailed explanation of NPIV partition setup and how traditionally LPM validation of NPIV partitions worked. Then, I’ll explain the validation enhancement brought out in this release.

 

How is a typical ‘NPIV LPAR’ configured; and what a Server/Storage Admin should care about ?

NPIV LPAR setup in IBM PowerVM

NPIV LPAR setup

A virtual FC adapter needs to be created on a LPAR for access to storage using NPIV i.e. vfc-c from the figure above.
The virtual FC adapter has a WWPN i.e. wwpn_c1 that needs to be “zoned” to the storage target port WWPN i.e. wwpn_T1. Zoning needs to be done on the FC switch.
Then the Logical Units(aka Disks) ( i.e. LU G and LU H ) that needs to be assigned to the NPIV LPAR are “masked” to the “Host / Host group” on the storage box. In this case LUs G and H are mapped to the Host with WWPN wwpn_c1.
This configuration is sufficient to bring up the NPIV LPAR and start using the disks assigned to it.

 

What are the configuration considerations for enabling LPM of NPIV LPAR ?

NPIV setup for LPM PowerVM

NPIV setup for LPM

Unlike what was shown in the previous figure there is NOT one WWPN ( i.e. wwpn_c1 ) but a pair of WWPNs (i.e. wwpn_c1 and wwpn_c2) that belong to a virtual FC adapter. Though a virtual FC adapter uses only one of these WWPNs during normal operations, the second one is required for LPM and is used on the destination Server. As shown in this figure, it is important to also zone“(switch) and “mask“(storage) the alternate WWPN i.e. wwpn_c2 for successful LPM operation.

 

What is the FC zoning check being performed as part of the existing NPIV LPAR “LPM validation” ?

NPIV LPM FC Zoning check PowerVM

NPIV LPM FC Zoning check

The existing LPM validation would check FC zoning i.e. if the alternate WWPN (i.e. wwpn_c2 -> used by migrating LPAR on destination server) has access to the same storage targets ports (i.e. wwpn_T1) as is accessed by primary WWPN(i.e. wwpn_c1 -> used by the migrating LPAR on source server ). It was limited to check FC zoning and did not validate to see if the same LUNs are masked to both the client WWPNs. In this case, since the same target port wwpn_T1 is zoned to both the WWPNs of the client, LPM validation succeeds.

 

What is the need for the new “NPIV Disk validation” introduced in this release of VIOS 2.2.4 ?

NPIV LPM Disk level validation

NPIV LPM Disk level validation

FC zoning check can only go so far as to ensure connectivity of destination server to the same targets-ports of storage server. But without the check to ensure that same disks are masked to the alternate WWPN of migrating client LPAR there could be disaster waiting unnoticed; unless the storage admin has done a very good job !
If there is a misconfiguration in LUN masking, there could be multitudes of problems for the migrated LPAR  :
a) Losing access to RootVG disks
b) Applications going down due to disk I/O error
c) Further worse, corrupting the storage disks of other Servers.

This feature enables the ability to check disks masked to the primary and alternate client WWPN.
From the above figure, NPIV Disk validation would identify the difference in LUN masking for the WWPNs of the client ( i.e. LU G and LU H masked to wwpn_c1, but LU D and LU H masked to wwpn_c2  ). Administrator would need to rectify the LU masking on the storage for successful LPM.


By increasing the strictness of LPM validation this enhancement could go a long way to ensure foolproof Live Partition Mobility !

 

How do I use / enable this feature ?

Disk/LUN level validation for LPM of NPIV client is disabled by default in VIOS 2.2.4.
Three new attributes have been added to ‘vioslpm0’ pseudo device of VIOS to manage ‘Disk Level Validation’ :

  1. src_lun_val
    This attribute manages the behaviour of LUN level validation on source VIOS. The supported values for this attribute are :
    $ lsdev -dev vioslpm0 -range src_lun_val
    on
    off
    It is disabled ( i.e. set to off ) by default. It needs to be explicitly enabled on the source VIOS ( i.e. set to on ) such that LPM validation would collect the Disk level details of the migrating NPIV LPAR.
  2. dest_lun_val
    This attribute manages the behaviour of LUN level validation on the destination VIOS. The supported values for this attribute are :
    $ lsdev -dev vioslpm0 -range dest_lun_val
    on
    off
    restart_off
    lpm_off
    It is set to ‘restart_off‘ by default, which means that LUN level validation is disabled for Remote Restart( and Suspend/Resume(SR) operations ) but is enabled for LPM operations.
    This attribute needs to be set to ‘lpm_off‘ to turn off LUN level validation for LPM but perform LUN validation for RR and SR operations.
    Again, it could be set to ‘on‘ to enable LUN level validation for all operations or ‘off‘ to disable LUN level validation for all operations.
  3. max_val_cmds
    This attribute manages the number of I/O commands allocated ( default to 100 ) for LUN validation and dictates the number of threads created by the validation process. Faster results of LUN validation are obtained by multi-threading the entire operation. The supported values are in the range of 1 to 256 :
    $ lsdev -dev vioslpm0 -range max_val_cmds
    1…256 (+1)
    It would generally be not required to change the attribute from the default value of 100; unless you see a big gain in validation operation time by increasing the value.

Below is a snippet of the new attributes(highlighted) in the attribute listing of ‘vioslpm0’ pseudo device :

$ lsdev -dev vioslpm0 -attr
attribute       value       description                                                          user_settable

auto_tunnel     1           Automatic creation of security tunnels                               True
cfg_msp_lpm_ops 8           Configured number of concurrent LPM operations for this MSP          True
concurrency_lvl 4           Concurrency level                                                    True
dest_lun_val    restart_off Enable or disable NPIV disk validation for remote restart            True
lpm_msnap_succ  1           Create mini-snap for successful migrations                           True
max_lpm_vasi    1           Maximum number of VASI adapters used for LPM operations              False
max_val_cmds    100         Change the number of commands allocated for NPIV LPM disk validation True
max_vasi_ops    8           Maximum number of concurrent LPM operations per VASI                 False
src_lun_val     off         Enable or disable NPIV disk validation for LPM                       True
tcp_port_high   0           TCP highest ephemeral port                                           True
tcp_port_low    0           TCP lowest ephemeral port                                            True

 

IBM Knowledge Center page which talks about these and other attributes of vioslpm0 pseudodevice used for LPM is located here : http://www-01.ibm.com/support/knowledgecenter/8247-21L/p8hc3/p8hc3_vioslpmpseudo.htm

 

How does the validation error message look like if disk validation for LPM operation failed ?
How do I infer the actual problem from the message ?

Pasted below is a snippet of sample validation failure output.
One could get this kind of a message in the “VIOS_DETAILED_ERROR” section of validation output if different set of LUNs were masked for both the WWPNs of the migrating LPAR.

.. .. .. output truncated .. .. ..
List of Logical Units found additional on destination ( i.e. masked on storage
target port = 0x202600a0b86e9998 with client’s alternate wwpn =
0xc0507607757c0028, but NOT masked with client’s source wwpn = 0xc0507607757c0029 ) :
Logical Unit 1 : descriptor type = 3, value = 600A0B80006E9B760000E55D5140F301.
Logical Unit 2 : descriptor type = 3, value = 600A0B80006E99980000E6A95140F2EE.

List of Logical Units found additional on destination ( i.e. masked on storage
target port = 0x202700a0b86e9998 with client’s alternate wwpn =
0xc0507607757c0028, but NOT masked with client’s source wwpn = 0xc0507607757c0029 ) :
Logical Unit 1 : descriptor type = 3, value = 600A0B80006E9B760000E55D5140F301.
Logical Unit 2 : descriptor type = 3, value = 600A0B80006E99980000E6A95140F2EE.

Logical Units masked for the client on the WWPN of source = 0xc0507607757c0029 and
destination = 0xc0507607757c0028 did not match. Please correct the LU masking from
Storage and retry.
.. .. .. output truncated .. .. ..

In the above example snippet, the migrating LPAR’s source WWPN is 0xc0507607757c0029 and alternate WWPN to be used on destination server is 0xc0507607757c0028.
There are two target-ports on the storage box where this NPIV LPAR’s WWPNs are zoned to : 0x202600a0b86e9998 and 0x202700a0b86e9998.
In this example the alternate WWPN of the client is masked to two additional Logical Units on the storage. The LU validation process found the difference and reported the mismatch. The additional logical units are reported once for each target-port the migrating LPAR’s WWPNs are zoned to.
It is sufficient for the user to note the Logical Unit descriptor type/value reported in the failure and make the required LU masking correction in the storage.

This feature is a “power”ful aid to Power customers and should go a long way to make the mobility of NPIV LPAR foolproof.

 

3 Responses to Disk level validation for LPM of NPIV LPAR

  1. Laurent Oliva says:

    LUN validation is a great improvement. I recently asked big blue to let the customer the opportunity to bypass “port validation” in order to focus only on LUN Validation.

    This is because some IT automation tools may not map the same FE ports for WWN and alternative WWN…

    I hope it will be shipped quickly 🙂

  2. heik_al says:

    great addition to “lpm validation process” thanks for sharing this info

What do you think ?

Set your Twitter account name in your settings to use the TwitterBar Section.
%d bloggers like this: