On Mon, 13 Nov 2006, Gonzalo Merino wrote:
> Valery Mitsyn escribi?:
> > On Mon, 13 Nov 2006, Gonzalo Merino wrote:
> >
> > Yes, it is. At our site JINR-LCG2 i've found two of such jobs atleast.
> > The some processes in such jobs run in its own session (pgrp) and
> > torque does not track them.
> >
>
> This is a quite bizarre situation, indeed.
>
> So, these jobs are calling setsid() at some point? This was not visible from
> the ps output that you sent to this thread, was it? Is it easy for you to get
> a complete ps output for a machine with one of these ill jobs, where the PGID
> information can be seen?
I'm believe a new session was born by "/bin/sh --login ...",
by call to setpgrp/setsid.
Erh, i've killed the jobs and forget to save the full state.
The next time, i'll be more accurate with postmorten analyce.
>
> Sanjay: if the condor glidein's are indeed dettaching from the PGID in this
> way Valery describes, we would really be interested in understanding this
> behaviour in detail, since it is somewhat disturbing. We would be interested
> in receiving some of these jobs in a "controlled manner" at our site (PIC) so
> that we can have a look into them while they run, to understand how the Torque
> version we run reacts on them.
Me too, at lcgce01.jinr.ru - JINR-LCG2.
>
> thanks a lot,
> Gonzalo
>
>
--
Best regards,
Valery Mitsyn
|