Hi,
I've only seen the infoprovider crash at Brunel once. Isolated case I
assumed I had made some mistake. I wonder if RAL has seen it.
The glue-generator.pl problem is an old one. I assumed that we're all
patching it.
raul
On 26/06/15 15:21, qing wrote:
> Hi, Kashif:
>
> At Glasgow we just rebuilt the ARC-CE from scratch because we are
> also changing the OS from SL6.4 to CentOS6.6, several issues currently
> in my mind:
>
> 1. concerning the WMS problem, in /usr/share/arc/submit-condor-job,
> you need to ensure in line 83 there is a '.' before '_condor_stdout$':
>
> if expr match "$joboption_stdout" '.*_condor_stdout$' >
> /dev/null; then
>
> otherwise single core job from WMS will encounter problem. I think
> ARC team will put this fix in the next release.
>
> 2. ensure $lrms_jobs{$id}{nodes} = [] in Condor.pm to avoid
> infoprovider crash, as indicated in my previous letter. ARC team will
> put this bug fix in next release.
>
> 3. If you want to publish fairshare between VOs, you need to hack
> /usr/share/arc/glue-generator.pl, at Glasgow we just added 3 lines
> after "GlueCECapability:
> CPUScalingReferenceSI00=$CPUSCALINGREFERENCESI00" line:
>
> GlueCECapability: Share=atlas:80
> GlueCECapability: Share=lhcb:10
> GlueCECapability: Share=other:10
>
> Cheers,Gang
>
> On 26/06/2015 14:50, RAUL H C LOPES wrote:
>> Hi Kashif,
>>
>> I've got 3 Arc-CEs in production. All on 5.0. The only problem was
>> that bug blocking submissions from WMS.
>> Solved.
>>
>> Thanks, raul
>>
>> On 26/06/15 14:28, Kashif Mohammad wrote:
>>> Hi
>>>
>>> On a related note, I am planning to upgrade from ARC 4.2 to ARC
>>> 5.0. Is there anything which I should be aware off? I have looked
>>> at the release note and it looks quite straight forward.
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>>> [log in to unmask]] On Behalf Of qing
>>>> Sent: 26 June 2015 12:02
>>>> To: [log in to unmask]
>>>> Subject: Bug in /usr/share/arc/Condor.pm lead to ARC infoprovider
>>>> crash
>>>>
>>>> Dear all:
>>>>
>>>> Some of you might notice that the BDII on Glasgow ARC-CEs
>>>> sometimes
>>>> disappeared, which is due to random crashes on ARC infoprovider.
>>>>
>>>> After discussing with the nordugrid ARC team,it's understood
>>>> that ARC does
>>>> not process some messages returned from condor quite well, thus makes
>>>> the crash of infoprovider quite random.
>>>>
>>>> To fix this bug, a line in /usr/share/arc/Condor.pm needs to be
>>>> modified.
>>>> For ARC version 5.0.0 it's line 550, and for ARC version 4.2.0-1,
>>>> it's line 545.
>>>>
>>>> $lrms_jobs{$id}{nodes} = "";
>>>>
>>>> needs to be changed to:
>>>>
>>>> $lrms_jobs{$id}{nodes} = [];
>>>>
>>>> If you see "Can't use an undefined value as an ARRAY reference at
>>>> /usr/share/arc/ARC0mod.pm line 135." in infoprovider.log, it means
>>>> you are
>>>> affected. Our site is heavily affected by this bug, the infoprovider
>>>> on our ARC-CEs crashes many times in a day. We applied this change
>>>> yesterday morning and during the past 24 hours when site is fully
>>>> loaded, the
>>>> infoprovider hasn't crashed for a single time on any of the
>>>> 4 ARC-CEs, this ensures me that the change fixed the bug. However,
>>>> since such crash happens randomly so the situation maybe different
>>>> between sites, I leave it to you to decide whether applying this
>>>> bug fix or not.
>>>>
>>>> Cheers,Gang
|