Hi Maarten,
On Wed, 3 Jun 2009 14:46:06 +0200
Maarten Litmaath <[log in to unmask]> wrote:
> Hallo Andreas,
>
> > I installed the recent version an hour ago:
> >
> > root@grid-ce2: [~] rpm -qf /opt/globus/sbin/globus-gma
> > globus-gma-1.0.12-lcg.noarch
> >
> > and restarted 'globus-gma'.
> >
> > The picture remains:
> >
> > root@grid-ce2: [~] ps auxw | egrep globus-gma | wc -l
> > 76
> > root@grid-ce2: [~] ps auxw | egrep globus-gma | grep defunct | wc -l
> > 71
>
> That probably is OK. The defunct processes are cleaned up at the end
> of each main cycle. The problem with earlier versions was that some
> processes were never cleaned up, so the list could grow steadily.
Our experience says that the CE is basically stuck if there such a high
number of defunct globus-gma processes. According to the logs every
minute (can be steered in config) some hanging processes a getting
killed. Users observe a large number of jobs in the status "scheduled"
in WMS, but they do not arrive in the batch system. The only thing that
helps is restarting the globus-gma.
We are bit puzzled...
Cheers, Christoph (who just restared globus-gma on one of the CEs)
|