As you will have probably noted we are gradually re-establishing the service. We have had some good luck this afternoon. Carnage in the fibre channel switch infrastructure for the various databases, BUT - just (exactly) enough left to get LFC/FTS and CASTOR databases up. We literally lost 1 out of every 2 F/C switches and 3 out of every 4 power supplies to the switches. Running on single F/C switches with single power supplies. If we blow anything (and clearly a chance in next 24 hours) we will be scraping around in the scrap heap. Nevertheless the databases are running and seem undamaged.
So - we now have the LFC/FTS services up - still hunting problems with transfer failures. We have pieced together the LPD room switch stack and are now delivering network to the CASTOR head nodes. Fabric team are just "banging on" the breakers on the APC units to the CASTOR head nodes but we will leave these down until the morning. Restart of disk servers is expected shortly. CASTOR team will check out CASTOR first thing in the morning. Of course we have to keep our fingers crossed that we don't get any failures overnight on the kit recently powered up - it would not be a surprise to do so.
There are now very few unknowns between us and restarting service.
Regards
Andrew
--
Scanned by iCritical.
|