From my tests it appears that the reserved node(s) get(s) the jobs
first then all the others.
When a (reserved) node is down or put off line (with "pbsnodes -o
<node_name>" ; cleared with "pbsnodes -c <same_node_name>") the jobs go
to the other nodes.
Something like this:
SRCFG[dteam] PERIOD=INFINITY HOSTLIST=eio99.pp.weizmann.ac.il
CLASSLIST=dteam
SRCFG[atlaslhcb] PERIOD=INFINITY HOSTLIST=eio23.pp.weizmann.ac.il
CLASSLIST=alice,cms,dteam,sixt,zeus,see,L,M,S,X
(The second reservation actually means do not send atlas or lhcb jobs to
node eio23.pp.weizmann.ac.il; a comma(,) or a column sign(:) seem both
good as item separators)
Emanouil's solution is the most elegant.
Burke, S (Stephen) wrote:
>Emanouil Atanassov wrote:
>
>
>>what is wrong with the following in maui.cfg, which is
>>
>>
>easier to type in, comment out, ammend, and even
>
>
>>circumvent if you want to (e.g., using qrun):
>>
>>SRCFG[monitoring] PERIOD=DAY DEPTH=30
>>SRCFG[monitoring] STARTTIME=00:01:00
>>SRCFG[monitoring] ENDTIME=23:59:00
>>SRCFG[monitoring] HOSTLIST=wn002
>>SRCFG[monitoring] GROUPLIST=dteam
>>SRCFG[monitoring] TASKCOUNT=1
>>
>>
>
>But I guess there's no resilience if wn002 goes down. Also if all the
>SFT jobs get run on a single node which isn't used for other jobs you
>lose a lot of the monitoring.
>
>Stephen
>
>
>
|