Hi Leif,
I'd like to resume this discussion, because I think that draining the
queues on a big cluster is not an optimal solution, on the other hand I
don't want to write recipes on wiki that might mess jobs.
I've looked a bit of documentation about how to stop MPI jobs and the
answer is always the same: it depends on your application.
SIGSTOP cannot be caught your application should be able to deal with
SIGTSTP.
Then I found on an LSF guide that actually this is what bstop does. It
sends SIGSTOP to normal jobs and SIGTSTP to parallel jobs.
so would it be good if I added the following part to the wiki page:
<MPI part>
if MPI jobs are running on your cluster use
qsig -s TSTP qselect -q whetever -s R
</MPI part>
or is it still too crude?
thanks
cheers
alessandra
On Tue, 28 Jun 2005, Leif Nixon wrote:
> Alessandra Forti // EOJ <[log in to unmask]> writes:
>
>> is that because they might not be stopped at the same time?
>
> You might hit all sorts of time-outs. (A node wakes up, looks at its
> watch: "Hey, it was an *hour* since I sent that message, and I haven't
> got an answer yet!")
>
> --
> Leif Nixon - Systems expert
> ------------------------------------------------------------
> National Supercomputer Centre - Linkoping University
> ------------------------------------------------------------
>
--
********************************************
* Dr Alessandra Forti *
* Technical Coordinator - NorthGrid Tier2 *
* http://www.hep.man.ac.uk/u/aforti *
********************************************
|