Dear all:
Some of you might notice that the BDII on Glasgow ARC-CEs sometimes
disappeared, which is due to random crashes on ARC infoprovider.
After discussing with the nordugrid ARC team,it's understood that ARC
does not process some messages returned from condor quite well, thus
makes the crash of infoprovider quite random.
To fix this bug, a line in /usr/share/arc/Condor.pm needs to be
modified. For ARC version 5.0.0 it's line 550, and for ARC version
4.2.0-1, it's line 545.
$lrms_jobs{$id}{nodes} = "";
needs to be changed to:
$lrms_jobs{$id}{nodes} = [];
If you see "Can't use an undefined value as an ARRAY reference at
/usr/share/arc/ARC0mod.pm line 135." in infoprovider.log, it means you
are affected. Our site is heavily affected by this bug, the infoprovider
on our ARC-CEs crashes many times in a day. We applied this change
yesterday morning and during the past 24 hours when site is fully
loaded, the infoprovider hasn't crashed for a single time on any of the
4 ARC-CEs, this ensures me that the change fixed the bug. However,
since such crash happens randomly so the situation maybe different
between sites, I leave it to you to decide whether applying this bug fix
or not.
Cheers,Gang
|