Hi Jason,
> from one of the CEs in our region, we found around 32k defunct
> globus-gma processes and cause continuous job submission failures. any
> idea how we able to suppress the number of defunct proc? or is this an
> known issue and already have patch issued?
>
> thanks
>
>
> # ps axuww |grep globus-gma | wc -l
> 31878
>
> # ps axuww |grep globus-gma | head -5
> cms175 300 0.0 0.0 0 0 ? Z Feb04 0:00
> [globus-gma] <defunct>
> cms175 301 0.0 0.0 0 0 ? Z Feb09 0:00
> [globus-gma] <defunct>
> cms175 302 0.0 0.0 0 0 ? Z Feb01 0:00
> [globus-gma] <defunct>
> cms175 303 0.0 0.0 0 0 ? Z Feb05 0:00
> [globus-gma] <defunct>
> cms150 304 0.0 0.0 0 0 ? Z Feb09 0:00
> [globus-gma] <defunct>
This was recently discussed on LCG-ROLLOUT. I include the summary here.
Andrey Kiryanov wrote:
> Hi Torsten,
>
> Torsten Harenberg wrote:
>
>>> yes, it is a known issue :-(
>>> the patch suggested in the following ticket
>>> https://gus.fzk.de/ws/ticket_info.php?ticket=42981 should fix the
>>> problem
>>
>>
>> thanks for the quick reply. I installed the patch.
>
>
> Despite this, it means that you CE suffers from a high load and job poll
> processes hang. There may be various reasons for this.
> Please check if you have "Killing hung process" lines in
> /opt/globus/var/log/globus-gma.log
> If they are there, try to add a 'tout 120' line in
> /opt/globus/etc/globus-gma.conf and see it it helps.
|