Hi, Kashif:
At Glasgow we just rebuilt the ARC-CE from scratch because we are
also changing the OS from SL6.4 to CentOS6.6, several issues currently
in my mind:
1. concerning the WMS problem, in /usr/share/arc/submit-condor-job, you
need to ensure in line 83 there is a '.' before '_condor_stdout$':
if expr match "$joboption_stdout" '.*_condor_stdout$' > /dev/null;
then
otherwise single core job from WMS will encounter problem. I think
ARC team will put this fix in the next release.
2. ensure $lrms_jobs{$id}{nodes} = [] in Condor.pm to avoid
infoprovider crash, as indicated in my previous letter. ARC team will
put this bug fix in next release.
3. If you want to publish fairshare between VOs, you need to hack
/usr/share/arc/glue-generator.pl, at Glasgow we just added 3 lines after
"GlueCECapability: CPUScalingReferenceSI00=$CPUSCALINGREFERENCESI00" line:
GlueCECapability: Share=atlas:80
GlueCECapability: Share=lhcb:10
GlueCECapability: Share=other:10
Cheers,Gang
On 26/06/2015 14:50, RAUL H C LOPES wrote:
> Hi Kashif,
>
> I've got 3 Arc-CEs in production. All on 5.0. The only problem was
> that bug blocking submissions from WMS.
> Solved.
>
> Thanks, raul
>
> On 26/06/15 14:28, Kashif Mohammad wrote:
>> Hi
>>
>> On a related note, I am planning to upgrade from ARC 4.2 to ARC 5.0.
>> Is there anything which I should be aware off? I have looked at the
>> release note and it looks quite straight forward.
>>
>> Thanks
>>
>> Kashif
>>
>>> -----Original Message-----
>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>> [log in to unmask]] On Behalf Of qing
>>> Sent: 26 June 2015 12:02
>>> To: [log in to unmask]
>>> Subject: Bug in /usr/share/arc/Condor.pm lead to ARC infoprovider crash
>>>
>>> Dear all:
>>>
>>> Some of you might notice that the BDII on Glasgow ARC-CEs sometimes
>>> disappeared, which is due to random crashes on ARC infoprovider.
>>>
>>> After discussing with the nordugrid ARC team,it's understood
>>> that ARC does
>>> not process some messages returned from condor quite well, thus makes
>>> the crash of infoprovider quite random.
>>>
>>> To fix this bug, a line in /usr/share/arc/Condor.pm needs to be
>>> modified.
>>> For ARC version 5.0.0 it's line 550, and for ARC version 4.2.0-1,
>>> it's line 545.
>>>
>>> $lrms_jobs{$id}{nodes} = "";
>>>
>>> needs to be changed to:
>>>
>>> $lrms_jobs{$id}{nodes} = [];
>>>
>>> If you see "Can't use an undefined value as an ARRAY reference at
>>> /usr/share/arc/ARC0mod.pm line 135." in infoprovider.log, it means
>>> you are
>>> affected. Our site is heavily affected by this bug, the infoprovider
>>> on our ARC-CEs crashes many times in a day. We applied this change
>>> yesterday morning and during the past 24 hours when site is fully
>>> loaded, the
>>> infoprovider hasn't crashed for a single time on any of the
>>> 4 ARC-CEs, this ensures me that the change fixed the bug. However,
>>> since such crash happens randomly so the situation maybe different
>>> between sites, I leave it to you to decide whether applying this bug
>>> fix or not.
>>>
>>> Cheers,Gang
|