JISCMail - LCG-ROLLOUT Archives

Hi Leif,

I'd like to resume this discussion, because I think that draining the 
queues on a big cluster is not an optimal solution, on the other hand I 
don't want to write recipes on wiki that might mess jobs.

I've looked a bit of documentation about how to stop MPI jobs and the 
answer is always the same: it depends on your application.

SIGSTOP cannot be caught your application should be able to deal with 
SIGTSTP.

Then I found on an LSF guide that actually this is what bstop does. It 
sends SIGSTOP to normal jobs and SIGTSTP to parallel jobs.

so would it be good if I added the following part to the wiki page:

<MPI part>

if MPI jobs are running on your cluster use

       qsig -s TSTP qselect -q whetever -s R

</MPI part>

or is it still too crude?

thanks

cheers
alessandra


On Tue, 28 Jun 2005, Leif Nixon wrote:

> Alessandra Forti // EOJ <[log in to unmask]> writes:
>
>> is that because they might not be stopped at the same time?
>
> You might hit all sorts of time-outs. (A node wakes up, looks at its
> watch: "Hey, it was an *hour* since I sent that message, and I haven't
> got an answer yet!")
>
> --
> Leif Nixon                       -            Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre    -      Linkoping University
> ------------------------------------------------------------
>

-- 
********************************************
* Dr Alessandra Forti			   *
* Technical Coordinator - NorthGrid Tier2  *
* http://www.hep.man.ac.uk/u/aforti	   *
********************************************