Hi Andrey,
we are not so much worried about the zombies itself. But if we see the number of zombies increasing the CE is almost stuck, not only for the ILC user in question.
It seems that we need to drain the CE in order to get it working properly again.
Cheers, Christoph
On Thu, 4 Jun 2009 19:04:45 +0400
Andrey Kiryanov <[log in to unmask]> wrote:
> Hi Andreas,
>
> Andreas Gellrich wrote:
> > Here I see a defunct process coming up ...
> > root@grid-ce2: [~] pp gma
> > root 3947 0.6 0.7 35256 29788 ? Ss 14:58 0:13 globus-gma: polling jobs
> > 41753 10740 48.2 0.0 0 0 ? Z 15:26 4:26 [globus-gma] <defunct>
>
> Right, 41753 is a UID for ilcprd003 account which is probably in a bad
> shape. It maybe that there are too many directories inside and a
> filesystem goes nuts, maybe something else, but the consequence is as
> follows: a jobmanager (lcgpbs in your case) cannot tell the status of a
> job in a reasonable time (currently 5 minutes). Hence the forked poll
> process hangs and gets killed by globus-gma parent process, which
> creates a zombie in a process table. These zombies pule up till the end
> of the poll cycle where all of them get removed in one shot.
> Unlike in movies, zombies themselves are nice - they do not consume
> system resources and only occupy a few bytes of memory in the process
> table, but they are a red light for a sysadmin. Other subsystems of your
> CE (job-manager, cass-cache, etc) suffer from bad account as well, but
> they do it silently.
> The only proper way of fixing this is to fix an account in question. If
> you plan to run such a high number of jobs per single account in the
> future you may consider changing the filesystem from ext3 to xfs as it
> handles deep directory structures (such as gass-cache) more effectively,
> but I'm not sure if it will help to handle 12000 jobs. GT2 architecture
> doesn't scale that far unfortunately.
> --
> Cheers,
> Andrey Kiryanov.
--
+-----------------------------------+
| Christoph Wissing DESY - CMS |
| E-Mail: [log in to unmask] |
| Phone: +49(0)40/8998-4122 |
+-----------------------------------+
|