Friday, 20 May 2011

Juniper EX Switch firmware upgrade process

Juniper EX switches come with two separate flash partitions a root (default boot) and a copy of that root in another piece of flash memory. Each partition contains a copy of the boot software and the configuration. Now we've deployed a number of EX clusters and from time to time we notice that sometimes the secondary partitions (the non-active one) doesn't get upgraded with the active one. This obviously causes problems if the active partition is corrupted and wont boot. Manually booting the secondary to get the switch up doesn't help if it's in a cluster because that node is then marked at NotPrsnt (Not Present) due to the fact it won't be running the same software as the other nodes.

So, luckily Juniper have come to the rescue and brought out the latest 10.4 firmware to do the following (source: www.juniper.net)


Resilient dual-root partitioning, introduced on Juniper Networks EX Series Ethernet Switches in Junos operating system (Junos OS) Release 10.4R3, provides additional resiliency to switches in the following ways:
  • Allows the switch to boot transparently from the second root partition if the system fails to boot from the primary root partition.
  • Provides separation of the root Junos OS file system from the /var file system. If corruption occurs in the /var file system (a higher probability than in the root file system due to the greater frequency in /var of reads and writes), the root file system is insulated from the corruption.
Great news this. So lets upgrade our firmware and get this great feature. Here is a copy of todays firmware version:


We need to get hold of the latest firmware from Juniper's website so we download that...but there was also an issue with our Jloader...it was too old. We will need to delete the old loader and upgrade that as well as the firmware (Jloader Upgrade Link). Good news is we can do both then reboot to cut down on reboot time. We've downloaded the jloader and junos image (10.4R3 is recommended at the time we wrote this).

First lets just check we can ping the FTP server holding the firmware images. We're using FileZilla server for this and we've created a new user call 'junos'. You can get hold of FileZilla server here


Looks good, right lets go with the upgrade.

Jloader first. The ftp load command uses the syntax 'request system software add ftp://10.10.15.23/jloader-ex-3242-11.3I20110326_0802_hmerge-signed.tgz'. This is basically saying we're using FTP to get the image and that the image is located on server with IP address 10.10.15.23. Without stating a username and password int he form ftp://username:password@ the JunOS parser will use the default username of 'anonymous' with no password. Some FTP servers come with built in anonymous support...FileZilla needs you to create a user called 'anonymous' with the password checkbox unchecked. HEre is the process:


Each node in the cluster will be upgraded in turn until it finishes and returns you back tot he prompt:


The FileZilla management console shows you the whole process as the file is pulled back to the cluster master node. Here is a brief screen shot of the download:


Right, thats the jloader upgrade bit applied (but not yet active until we reboot. To save time we're now going to upgrade the firmware so that we only do one reboot. Here is the process and remember, this is a 4 node cluster as shown by the fpc0,1,2,3. If you have a larger node number then your output will be different.


So, just like the man said 'A reboot is required to install the software'. Let us oblige...


It took about 5 minutes to come around again, I logged in and checked the firmware versions and loader.


Thats OK now I checked the state of the partitions


Looks like I have two partitions there active/backup....all looking pretty sweet. Lets get some more information on the state of those partitions...detail?...nah thats what they would expect you to do lets look at the snapshot...


We're upgraded, we've got two healthy partitions...just need to wait for a failure now to see it automatically fix itself...but I won't wish for that. I think one question we're all asking is how do I find out if there has been a partition failure if it fixes itself?

Well you've got console logs, syslog and SNMP...take your pick. From the management port you will see



  • WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE




  • You can of course always look at the chassis alarms.




  • user@switch> show chassis alarms
    1 alarms currently active
    Alarm time Class Description
                    2011-02-17 05:48:49 PST Minor Host 0 Boot from backup root




  • Thank you for reading and may all of your upgrades be as sweet.

    No comments:

    Post a Comment