Mass lunreset

Posted by

Introduction:

The organization’s data center suffered a power loss impacting multiple components and systems. Post the power restoration you are observing issues with in your SDDC platform like but not limited to below

  1. ESXi hosts keep dropping out of vCenter server
  2. Multiple VMFS datastores are inaccessible or facing performance issues  
  3. Host reboots have been attempted
  4. The scale of the environment is making it difficult to isolate the hosts that need a reboot

Solution:

A quick and dirty way to recover out of the situation is to perform a Mass LUN reset. Since, we are already broken this cannot make the situation any worse. 

Run the script below on all powered on ESXi hosts in the environment. It is preferable to run the script simultaneously on the hosts in a same cluster

naaIds=`ls /vmfs/devices/disks/ | grep -i :1 | grep -v vml | awk -F: '{print $1}'`
for i in $naaIds;
do
echo processing NaaID:$i;
vmkfstools -L lunreset /vmfs/devices/disks/$i;
echo Done with NaaID:$i;
done;

In most cases this will make ESXi hosts stable. We will still need to reboot the VMs  impacted by the situation. The Windows VMs that are not hung might recover on their own. However, most Nix VMs will need a reboot as they will have file system in read only state.