Upgrading SAN firmware

The main prerequisite for this work to proceed is that all SAN connected hosts must have been tested to ensure that loss of a Fibre Channel port will not cause that host to fail, ie. traffic fails over to the unused fabric.

If this is met, it will be safe to portdisable all server connected ports on the switch to be upgraded prior to disabling the switch. See this link below for the status of all SAN connected hosts and if they have been tested individually.

It's worth pointing out that during the update process, fabric redundancy will be lost, as only one fabric will be active, the other will be undergoing the update so a failure of the working fabric will result in data loss. So even if all hosts have been proved to be able to switch paths to the 2nd fabric, it's best to do this at a quiet time.

SAN connected servers

Current firmware levels

Switch (IPaddress) Switch Model Current Firmware Primary Secondary Firmware Target Firmware
fcsw1-01 (172.27.206.215) SilkWorm4100 v5.1.0b v5.1.0b v6.4.2b
fcsw2-01 (172.27.206.216) SilkWorm4100 v5.1.0b v5.1.0b v6.4.2b
fcsw1-02 (172.27.208.153) SilkWorm4024 v5.0.5 v5.0.5 v6.4.2b
fcsw2-02 (172.27.208.152) SilkWorm4024 v5.0.5 v5.0.5 v6.4.2b
fcsw1-03 (172.27.208.172) SilkWorm4024 v6.2.2c v6.2.2c v6.4.2b
fcsw2-03 (172.27.208.173) SilkWorm4024 v6.2.2c v6.2.2c v6.4.2b
fcsw1-04 (172.27.208.174) Brocade5100 v6.4.0b v6.4.0b v6.4.2b
fcsw2-04 (172.27.208.175) Brocade5100 v6.4.0b v6.4.0b v6.4.2b

Brocade firmware location

Brocade firmware can be found at:- satellite02:/firmware/BrocadeFirmware. This is not web or ftp enabled, you will need to use sftp. The archive is read only for all users.

Firmware release path

Release v6.4.0b supports 4100 and 4024 hardware. 5100 hardware is already at 6.4.0b. 6.4.2b is the latest in the release stream we are on (6.4x).

6.6x is the latest supported release for 4100, 4024 and 5100 switches, so upgrading all switches to 6.4.0b and then to 6.4.2b is supported. From HP's release notes for 6.4.2b, If applicable, HP recommends that you upgrade to Fabbric OS 6.4.2b as soon as possible to take advantage of the latest fixes and features.

Recommended upgrade path is 5.0.1d → 5.2.3 → 5.3.2c → 6.0.1a → 6.1.2b → 6.2.2e → 6.3.2e → 6.4.2b

Switch update sequence

     fcsw1-03         fcsw1-02 <-->  fcsw1-04
            |         |
DC04        |         |
----------------------------------     
DC02        |         |
            |         |
              fcsw1-01  

Failing sw02 will break db connectivity in DC02 on sw1 (app1 requires storage on array02 off sw4) and db connectivity on sw3 (app2 links through sw01, sw02, sw03 to access array02) as well as db connectivity on sw2 itself (app2 needs storage on array02 off sw4). Apart from sw3, any upgrades will disrupt the whole fabric.

sw02 has the oldest firmware and probably should be done first. But this risks the most disruption. To do this safely, we should disable all server ports on the fabric on connected switches to control the order in which we loose paths and to check that storage connectivity to the hosts systems is maintained through the alternate fabric.

sw01 has the next oldest firmware and also is the fabric principal. The backup window may be disrupted because all the tape drives are connected to FAb1 on this switch.

sw03 is already on a 6.x release, upgrading this only takes 2 app02 db servers off.

sw04 is the most current release firmware, but taking this offline has an impact to ALL SAN connected servers because all the storage at present is on array02 which is only connected to sw04. Also connected to switch4 is the MSA and dbr01. The MSA is not an active/active storage device, so failing one fabric may force an interruption to traffic. It is probably best to stop Oracle on dbr01 as a precaution.

Proposed sequence is :- sw02 → 6.x, sw01 → 6.x, sw03 → 6.4.2b, sw04 → 6.4.2b, sw02 → 6.4.2b, sw01 → 6.4.2b.

This is for each fabric. I propose doing an alternate fabric each day to allow any issues to be exposed before moving on.

Firmware update process

Current firmware (v5.1.0b) on switch1 requires ftp to get a new firmware file, as it is on the .xxx network we can use SAN Loader on srv-sma01 for provide an ftp server.

Switch2 (v5.0.5) is the earliest release, and does not say from the first part of firmware download anything useful apart from the switch will be reset. It will probably require ftp, so a temp ftp server needs setting up. satellite02 has the required free disk space so this is the candidate for this.

Switch3 (v6.2.2c) and 4 (v6.4.0b) firmware allows use of scp to download, so no ftp server needs setting up.

Example session

fcsw1-01:admin> firmwaredownload
Server Name or IP Address: 1.1.1.1
FTP User Name:
File Name:
FTP Password:
You can run firmwaredownloadstatus to get the status
of this command.

This command will cause the switch to reset and will
require that existing telnet, secure telnet or SSH
sessions be restarted.

Do you want to continue [Y]: n
fcsw1-01:admin>

Proposed Toplology change

The SAN layout is currently very linear, all switches are connected in line with each other but looped in and out of DC02 (see above diagram), a failure of a single switch or fibre will cause a break in data flow and possible segmentation of the fabric.

One of the limits is that there are only two fibre interconnects between DC2 and DC4. The partially meshed design below is resiliant to any one link failing and limits hops to a maximum of one except for traffic from sw3 to sw1 (in DC02), whilst not using more than the two existing links between DC02 and DC04. Failure of sw01 or sw04 will still cause disruption as these switches have the storage attached to them. One additional sfp is required for sw2 on each fabric to link sw2 to sw4. A spare sfp slot is available on sw2.

 sw1----sw4    sw1 is in DC02, 
  |    / |     sw2,3 & 4 are in DC04
  |   /  |
  |  /   |
  | /    |
  |/     | 
 sw2----sw3

rb/sanconfig-upgrade.txt · Last modified: 19/12/2018 16:25 by andrew