Hi Maarten,
thanks for the prompt reply, i am applying the patch on the CE later and
monitoring the status for another few days. to save time fixing the
problem, i have forced killing all defunct proc earlier.
keep you posted then.
thanks
BR,
J
Maarten Litmaath wrote:
> Hi Jason,
>
>> from one of the CEs in our region, we found around 32k defunct
>> globus-gma processes and cause continuous job submission failures. any
>> idea how we able to suppress the number of defunct proc? or is this an
>> known issue and already have patch issued?
>>
>> thanks
>>
>>
>> # ps axuww |grep globus-gma | wc -l
>> 31878
>>
>> # ps axuww |grep globus-gma | head -5
>> cms175 300 0.0 0.0 0 0 ? Z Feb04 0:00
>> [globus-gma] <defunct>
>> cms175 301 0.0 0.0 0 0 ? Z Feb09 0:00
>> [globus-gma] <defunct>
>> cms175 302 0.0 0.0 0 0 ? Z Feb01 0:00
>> [globus-gma] <defunct>
>> cms175 303 0.0 0.0 0 0 ? Z Feb05 0:00
>> [globus-gma] <defunct>
>> cms150 304 0.0 0.0 0 0 ? Z Feb09 0:00
>> [globus-gma] <defunct>
>
> This was recently discussed on LCG-ROLLOUT. I include the summary here.
>
> Andrey Kiryanov wrote:
>
> > Hi Torsten,
> >
> > Torsten Harenberg wrote:
> >
> >>> yes, it is a known issue :-(
> >>> the patch suggested in the following ticket
> >>> https://gus.fzk.de/ws/ticket_info.php?ticket=42981 should fix the
> >>> problem
> >>
> >>
> >> thanks for the quick reply. I installed the patch.
> >
> >
> > Despite this, it means that you CE suffers from a high load and job poll
> > processes hang. There may be various reasons for this.
> > Please check if you have "Killing hung process" lines in
> > /opt/globus/var/log/globus-gma.log
> > If they are there, try to add a 'tout 120' line in
> > /opt/globus/etc/globus-gma.conf and see it it helps.
>
--
Jason Shih
ASGC/OPS
Tel: +886-2-2789-8311
Fax: +886-2-2783-7653
|