I'm still not quite understanding why GridICE shows:
TOTAL CPU: 2262
FREE CPU: 1364
RUNNING JOBS: 527
WAITING JOBS: 848
So of the ~900 CPUs monitored by GridICE which are *not* FREE, 527 have
LCG jobs. Is a reasonable conclusion that the remaining ~375 CPUs are
in use by local batch system jobs, not originating from the LCG? In
other words, 60% of all running jobs on sites monitored by GridICE are
LCG and 40% are non-LCG?
Also, for the ~850 WAITING jobs, given the ~1350 FREE CPUs, that seems
quite surprising. Is there any way to find out why the waiting jobs
won't match to the free CPUs? To me, the only *good* reason is that the
FREE CPUs which are monitored by GridICE are attached to queues which
are reserved for non-LCG usage. The two *bad* reasons for this, that I
can think of, are:
1. The jobs have strange requirements which do not allow them to run on
available CPUs. These requirements should be identified and the free
CPUs made to accomodate them so the jobs can run; and,
2. The queues have bad configuration meaning they are waiting for jobs
which will never arrive (perhaps very long jobs or very short jobs).
They should be reconfigured so they accept the outstanding jobs.
I'd be interested to know how reasonable this speculation is.
Cheers,
Ian.
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
|