On 09/07/2011 04:27 PM, Chris Brew wrote:
> Hi,
>
> I have run:
>
> glite-ce-service-info -L 2 heplnx206.pp.rl.ac.uk
>
Hi Chris
Did you also try a submission ?
> Against the CEs while they are blacklisted with no errors and other WMSes
> have continued to submit jobs with no error.
So is the CE blacklisted only for the submissions done by a specific WMS
(while everything works properly for jobs submitted by other WMSes) ?
Cheers, Massimo
>
> I have occasionally at other times seen submission blocked because the
> number of FTP connections is above the threshold of 30 but that's always
> transitory while the WMS will block us until I reboot. (The load on the
> hardware is still low when that's over threshold so is it possible to
> increase it?).
>
> One suggestion I've had is to create extra indexes in the mysql DB but
> that's outside my area of competence. Indeed the mysql daemon is using a
> fair amount of CPU and has a good few connections open.
>
> Yours,
> Chris.
>
>> -----Original Message-----
>> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
>> On Behalf Of Rodney Walker
>> Sent: 07 September 2011 14:55
>> To: [log in to unmask]
>> Subject: Re: [LCG-ROLLOUT] CreamCEs keep getting blacklisted by WMS
>>
>> Hi,
>> From a lay perspective - I do not know what blacklisting means for the
>> WMS -did you try, e.g.
>> $ glite-ce-allowed-submission lcg-lrz-ce2.grid.lrz.de
>> Job Submission to this CREAM CE is disabled
>>
>> for your CE and get disabled. The reason mine is now disabled is shown
>> by
>>
>> /opt/glite/bin/glite_cream_load_monitor --show
>> Threshold for Swap Usage: 95 => Detected value for Swap Usage: 100.00%
>>
>> And indeed a reboot will fix it, but also a restart. OTOH there are
>> other thresholds listed, which might affect you.
>>
>> Cheers,
>> Rod.
>>
>>
>> On 09/07/2011 03:43 PM, Massimo Sgaravatto - INFN Padova wrote:
>>> On Wed, 7 Sep 2011, Chris Brew wrote:
>>>
>>>> Hi Massimo,
>>>>
>>>> I'm getting it from the UMD repository so:
>>>>
>>>> [root@heplnx206 ~]# rpm -qa | grep cream
>>>> glite-ce-cream-1.13.2-1.sl5
>>>> glite-ce-cream-utils-1.1.0-3.sl5
>>>> glite-ce-yaim-cream-ce-4.2.0-3.sl5
>>>> emi-cream-ce-1.0.0-1.sl5
>>>
>>> Ok, I thought that you might be afftected by this bug:
>>>
>>> https://savannah.cern.ch/bugs/?82567
>>>
>>> but this shouldn't be the case since your version of CREAM is recent
>>> enough
>>>
>>> When the CE is blacklisted can you try a glite-ce-job-submit towards
>>> that CE ? Can you also check the glite-ce-cream.log* ?
>>>
>>>
>>>>
>>>>
>>>> Though I have already "backported" the trustmanager fix from emi.
>>>
>>>
>>> Ok. I guess you know that updating the trustmanager rpm is not enough
>>> (also the relevant jar within ce-cream needs to be updated)
>>>
>>>
>>> Cheers, Massimo
>>>
>>>>
>>>> Thanks,
>>>> Chris.
>>>>
>>>> On 07/09/2011 14:33, "Massimo Sgaravatto - INFN Padova"
>>>> <[log in to unmask]> wrote:
>>>>
>>>>> Hi Chris
>>>>>
>>>>> Maybe I have an idea
>>>>> But could you please tell me first what is the version of the
>>>>> glite-ce-* rpms ?
>>>>>
>>>>> Cheers, Massimo
>>>>>
>>>>>
>>>>> On Wed, 7 Sep 2011, Chris Brew wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> After replacing our LCG-CEs with CreamCEs, we keep having problems
>>>>>> where
>>>>>> our CreamCes get blacklisted by the WMSs that the VO SAM tests
>> use. It
>>>>>> has
>>>>>> happened to all the CEs at some point or other - they seem to run
>> fine
>>>>>> for
>>>>>> a few days to a week then hit this.
>>>>>>
>>>>>> It is not transitory the SAM tests start failing and continue to
>> fail
>>>>>> until we intervene. Restarting the gLite services does not appear
>>>>>> to fix
>>>>>> the but but rebooting does.
>>>>>>
>>>>>> Other WMSs including the ones used by the NGI_UK ops SAM tests
>>>>>> continue
>>>>>> to
>>>>>> work fine with the CreamCEs.
>>>>>>
>>>>>> It does not appear to be load related as the boxes seem to have
>> plenty
>>>>>> of
>>>>>> free memory and do not appear to be under heavy load when it
>> happens.
>>>>>>
>>>>>> We've increased the innodb_buffer_pool_size, and reduced the purge
>>>>>> times
>>>>>> for both the Cream and Blah components which does not appear to
>> have
>>>>>> fixed
>>>>>> the issue.
>>>>>>
>>>>>> We're using the UMD release with Argus authentication.
>>>>>>
>>>>>> Any ideas what else I should be trying?
>>>>>>
>>>>>> Thanks,
>>>>>> Chris.
>>>>>>
>>>>>
>>>>> \|||/
>>>>> -----------0oo----( o o )----oo0-------------------
>>>>> (_)
>>>>> INFN Sezione di Padova
>>>>> Via Marzolo, 8
>>>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
>>>>> Tel: ++39 0499677360 Skype: massimo.sgaravatto
>>>>> Fax: ++39 0498275952
>>>>
>>>
>>> \|||/
>>> -----------0oo----( o o )----oo0-------------------
>>> (_)
>>> INFN Sezione di Padova
>>> Via Marzolo, 8
>>> 35131 Padova - Italy E-mail: massimo.sgaravatto [at] pd.infn.it
>>> Tel: ++39 0499677360 Skype: massimo.sgaravatto
>>> Fax: ++39 0498275952
>>
>>
>> --
>> Tel. +49 89 289 14152
|