JISCMail - LCG-ROLLOUT Archives

Hallo Christoph,

> > > root@grid-ce2: [~] rpm -qf /opt/globus/sbin/globus-gma
> > > globus-gma-1.0.12-lcg.noarch
> > > 
> > > and restarted 'globus-gma'.
> > > 
> > > The picture remains:
> > > 
> > > root@grid-ce2: [~] ps auxw | egrep globus-gma | wc -l
> > > 76
> > > root@grid-ce2: [~] ps auxw | egrep globus-gma | grep defunct | wc -l
> > > 71
> > 
> > That probably is OK.  The defunct processes are cleaned up at the end
> > of each main cycle.  The problem with earlier versions was that some
> > processes were never cleaned up, so the list could grow steadily.
> 
> Our experience says that the CE is basically stuck if there such a high
> number of defunct globus-gma processes. According to the logs every
> minute (can be steered in config) some hanging processes a getting
> killed. Users observe a large number of jobs in the status "scheduled"
> in WMS, but they do not arrive in the batch system. The only thing that
> helps is restarting the globus-gma.
> 
> We are bit puzzled...
> 
> Cheers, Christoph (who just restared globus-gma on one of the CEs)

Did you modify any other parameters besides the debug flag?

Does the problem occur for all users or a subset, e.g. "power" users?