Hi Kashif,
Are the WNs that it lists as "whole machines" (i.e. machines running multi-core jobs) actually all running at least one multi-core job each? Also, are there any other WNs which are not either draining already or not already running multi-core jobs?
Regards,
Andrew.
________________________________
From: Kashif Mohammad [[log in to unmask]]
Sent: Friday, August 15, 2014 10:56 AM
To: Testbed Support for GridPP member institutes
Cc: Lahiff, Andrew (STFC,RAL,PPD)
Subject: condor multicore
Hi Andrew
(Sending to TB support as someone else may have seen the same issue)
I have set up multicore and it seems to be working as few atlas test multi core jobs finished successfully. I can see that there are six multi core jobs in queue but condor is draining only one WN
My settings are
DEFRAG_INTERVAL = 1200
DEFRAG_DRAINING_MACHINES_PER_HOUR = 20.0
DEFRAG_MAX_CONCURRENT_DRAINING = 40
DEFRAG_MAX_WHOLE_MACHINES = 80
DEFRAG_SCHEDULE = graceful
DEFRAG.SETTABLE_ATTRS_ADMINISTRATOR = DEFRAG_MAX_CONCURRENT_DRAINING,DEFRAG_DRAINING_MACHINES_PER_HOUR,DEFRAG_MAX_WHOLE_MACHINES
ENABLE_RUNTIME_CONFIG = TRUE
DEFRAG_RANK = ifThenElse(Cpus >= 8, -10, (TotalCpus - Cpus)/(8.0 - Cpus))
DEFRAG_WHOLE_MACHINE_EXPR = ((Cpus == TotalCpus) || (Cpus >= 8)) && StartJobs =?= True
DEFRAG_REQUIREMENTS = PartitionableSlot && Offline =!= True && StartJobs =?= True
## Logs
MAX_DEFRAG_LOG = 104857600
MAX_NUM_DEFRAG_LOG = 10
Looking at the log for one cycle of defragmentation
08/15/14 10:37:40 There are currently 1 draining and 12 whole machines.
08/15/14 10:37:40 Set of current whole machines is
08/15/14 10:37:40 t2wn1.physics.ox.ac.uk
08/15/14 10:37:40 t2wn100.physics.ox.ac.uk
08/15/14 10:37:40 t2wn101.physics.ox.ac.uk
08/15/14 10:37:40 t2wn102.physics.ox.ac.uk
08/15/14 10:37:40 t2wn103.physics.ox.ac.uk
08/15/14 10:37:40 t2wn104.physics.ox.ac.uk
08/15/14 10:37:40 t2wn111.physics.ox.ac.uk
08/15/14 10:37:40 t2wn2.physics.ox.ac.uk
08/15/14 10:37:40 t2wn3.physics.ox.ac.uk
08/15/14 10:37:40 t2wn4.physics.ox.ac.uk
08/15/14 10:37:40 t2wn7.physics.ox.ac.uk
08/15/14 10:37:40 t2wn8.physics.ox.ac.uk
08/15/14 10:37:40 Set of current draining machine is
08/15/14 10:37:40 t2wn29.physics.ox.ac.uk
08/15/14 10:37:40 Newly Arrived new machines is
08/15/14 10:37:40 (no machines)
08/15/14 10:37:40 Newly departed draining machines is
08/15/14 10:37:40 (no machines)
08/15/14 10:37:40 Lifetime new machines arrived: 556
08/15/14 10:37:40 Lifetime mean arrival rate: 2.39058 machines / hour
08/15/14 10:37:40 Lifetime mean arrival rate sd: 5.28565
08/15/14 10:37:40 Average pool draining badput = 0.00%
08/15/14 10:37:40 Average pool draining unclaimed = 25916.76%
08/15/14 10:37:40 Looking for 6 machines to drain.
08/15/14 10:37:40 Drained 0 machines (wanted to drain 6 machines).
I cannot make out that why it drained 0 machines when it wants to drain 6 machine.
Thanks
Kashif
--
Scanned by iCritical.
|