Elena Korolkova wrote:
> Hello
>
> we are struggling with a problem that it is taking forever for jobs,
> including SAM test jobs, to run through.
> When they do run through, they are largely successful, but there is
> clearly some problem.
>
> We've pretty much ruled out a bad worker node.
>
> However, we were having a problem with globus-gma : many "WARN:
> Killing hung poll process" messages in
> globus-gma.log and lots of <defunct> processes on ce.
>
> All the steps suggested in
> https://gus.fzk.de/ws/ticket_info.php?ticket=42981 have been done.
> We still have warning messages and <defunct> processes>. All the
> processes now belong to one atlas user Birmingham.
>
> Any ideas are greatly appreciated
>
> Cheers
> Elena
Our solution to this (or at least something very similar to this) is
clunky and evidently not ideal, but we just have a cron that runs every
15 minutes on each CE, and restarts most of globus if there are defunct
processes.
If anyone has a better solution we would be interested in hearing it!
--
David Ambrose-Griffith - [log in to unmask]
IPPP, Department of Physics, Durham University,
Science Laboratories, South Road, Durham, DH1 3LE
Direct Dial: +44 (0)191 3343704
Office: +44 (0)191 334 3811
|