Hello all,
A few weeks back we split off a portion of one of our queues for some
atlas whole-node job test on our torque/maui cluster. The split went
well, and the "multicore" queue is working well. But for some reason
maui has become lazy at scheduling jobs to the remaining "single-core
job" nodes, only keeping 2/3 of them full at any one time. We're left
with ~100 free job slots at any one time and lots of waiting jobs.
Scheduling still occurs (all jobs seem to run eventually), but this is a
lot of cores going idle for no reason! Jobs can be forced to run on
these slots, but that's no way to run a batch system. After staring at
this till my eyes bleed I thought I'd ask my peers for help (after
Google forsook me).
The technique we used to split the queues was simply to edit our torque
"nodes" file so that some nodes had the "mult" feature whiles the rest
were given the "sing" feature. Then we gave our current, normal queue
the requirement to use nodes with the "sing" feature (using qmgr,
resources_default.neednodes = sing), and created a new queue "mcore"
that required nodes with the "mult" feature.
I didn't add anything special to the maui.cfg (I experimented with this
later, but nothing I did helped so I cleansed all my changes).
The mcore queue behaves well, and is always full. The old queue isn't.
Looking into it if I run showconfig on our headnode I see this line:
CLASSCFG[mcore] DEFAULT.FEATURES=[mult]
but no corresponding line for our single core nodes (and explicitly
setting it in the maui.cfg not only doesn't work, it stops normal jobs
running at all). Also there's no sign of this line anywhere in any
config, so maui is making it up
Furthermore running a "diagnose -n" I see that while all the nodes have
the correct value in the features column ([sing] for normal nodes,
[mult] for mcore nodes), all nodes have [mcore_X:Y] in the Classes
column (where X & Y vary per node, and as I understand it is the
free/total "slots" for that class on that node).
A snippet showing a node from each group:
Name State Procs Memory Disk
Swap Speed Opsys Arch Par Load Res Classes
Network Features
wn064.lancs.pygrid Running 3:8 24098:24098 1:1
22560:28099 1.00 linux [NONE] DEF 5.00 006 [mcore_8:8]
[DEFAULT] [sing]
wn001.lancs.pygrid Busy 0:1 16054:16054 1:1
18459:20054 1.00 linux [NONE] DEF 1.00 001 [mcore_0:1]
[DEFAULT] [mult]
I experimented with separating the two node groups into partitions in
maui (explicitly defining partition membership for each node as trying
to be clever just backfires) but that didn't seem to have any affect,
presumably because the Classes are screwed up.
Any help on this would be very much appreciated, I found several
references to maui not honouring class changes but none of these cries
for help had any visible answers (http://xkcd.com/979/ springs to mind).
I've trawled my configs for any tainting whitespace but couldn't find any.
I leave the output of a qmgr -c 'print server' below which might yield
clues. Thanks in advance.
Cheers,
Matt
create queue q
set queue q queue_type = Execution
set queue q acl_host_enable = True
set queue q acl_hosts = fal-pygrid-44.lancs.ac.uk
set queue q acl_hosts += ce2.lancs.pygrid
set queue q resources_max.cput = 48:00:00
set queue q resources_max.walltime = 72:00:00
set queue q resources_default.neednodes = sing
set queue q resources_default.walltime = 72:00:00
set queue q acl_group_enable = True
set queue q acl_groups = atlas
set queue q acl_groups += alice
set queue q acl_groups += lhcb
set queue q acl_groups += cms
set queue q acl_groups += dteam
set queue q acl_groups += dzero
set queue q acl_groups += babar
set queue q acl_groups += biomed
set queue q acl_groups += zeus
set queue q acl_groups += cdf
set queue q acl_groups += pheno
set queue q acl_groups += ilc
set queue q acl_groups += hone
set queue q acl_groups += t2k
set queue q acl_groups += geant4
set queue q acl_groups += magic
set queue q acl_groups += planck
set queue q acl_groups += fusion
set queue q acl_groups += ops
set queue q acl_groups += gridpp
set queue q acl_groups += totalep
set queue q acl_groups += camont
set queue q acl_groups += nthgrid
set queue q acl_groups += snemo
set queue q acl_groups += minos
set queue q acl_groups += esr
set queue q acl_groups += cedar
set queue q enabled = True
set queue q started = True
#
# Create and define queue mcore
#
create queue mcore
set queue mcore queue_type = Execution
set queue mcore acl_host_enable = True
set queue mcore acl_hosts = fal-pygrid-44.lancs.ac.uk
set queue mcore acl_hosts += ce2.lancs.pygrid
set queue mcore resources_max.cput = 48:00:00
set queue mcore resources_max.walltime = 72:00:00
set queue mcore resources_default.neednodes = mult
set queue mcore resources_default.walltime = 72:00:00
set queue mcore acl_group_enable = True
set queue mcore acl_groups = prdatlas
set queue mcore acl_groups += pltatlas
set queue mcore acl_groups += sgmatlas
set queue mcore enabled = True
set queue mcore started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
set server acl_hosts = fal-pygrid-44.lancs.ac.uk
set server managers = [log in to unmask]
set server operators = [log in to unmask]
set server default_queue = q
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.walltime = 72:00:00
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = lcgpro
set server node_pack = False
set server mail_domain = never
set server kill_delay = 10
set server next_job_number = 1354305
|