Hello Maarten,
04.06.2009, Χ 0:27, Maarten Litmaath ΞΑΠΙΣΑΜ(Α):
> Can you try setting some parameters to _less_ aggressive values,
> according to the advice given here:
>
> https://twiki.cern.ch/twiki/bin/view/EGEE/LcgCE
>
> Or go even further:
>
> stateage 600
> tout 600
> tick 600
>
> This way the batch system query load should go down by a lot.
> Of course it means the WMS nodes will be updated a lot less often,
> but that should not hurt normal jobs. The idea is that the CE is
> better protected when the update frequency is lower.
With a version of globus-gma used on these CEs (1.0.12) it's possible
to configure adaptive poll intervals with 'statefact' configuration
parameter (somehow explained on a wiki page).
It will cause globus-gma to poll short jobs more frequently than long
ones which decreases load on a batch system and improves a rate of job
staus updates on WMS side at the same time.
For this to work one have to set 'tick' to some relatively small value
(60) while keeping 'stateage' high (600 or even 1200).
On the other hand a high 'tout' value may have negative effect on a
duration of a poll cycle (which already seems to be too long in our
case).
LCG CE in question does not stuck bacause of a defunct globus-gma
processes (it's just a sign that something goes wrong). It stucks
because a single poll cycle takes hours to complete.
--
Cheers,
Andrey Kiryanov.
|