Hi Ben (and others),
Below are some thoughts on the effect of this for people running long
jobs (like us, for LHCb DC04).
Ben Waugh wrote:
> The HEP farm has 10 nodes, also with dual hyperthreading processors.
> However, each node has 8 virtual processors as far as PBS is concerned,
> apart from the PBS server, which has 4. We wanted to be able to run
> low-priority jobs on all nodes while nothing more urgent was around, but
> still let more urgent jobs run immediately when they arise. Thus there is
> a "bulk" queue that can run up to 38 jobs, but these are run at a high
> "nice" level so jobs on the other queues can effectively push the bulk
> queues into the background while they run. Thus the CE advertises 76
> "CPUs".
>
> One other question that arises from this, and was briefly discussed on
> LCG-ROLLOUT without coming to any firm conclusions, is what SpecInt rating
> to advertise for hyperthreading processors, since each job (if the farm is
> fully loaded) gets only half a processor. I've taken the rating of a CPU,
> halved it and then added a bit, but I'm open to better suggestions.
The big issue we have seen here (if you haven't read all about it
already on LCG-ROLLOUT) is the issue of queue wall clock time limit and
CPU time limit. You really need to be careful that jobs don't get a
small slice of the CPU and are reniced in such a way that they are
killed after 5 days or 7 days or whatever and have only had the
effective use of, say, 1 day of full CPU time. We have lost thousands
of jobs to this "issue".
Also, since LCG doesn't provide any way to insert a "max wall-clock
time" for a job, we are using proxy certificates with an expiry (I'm not
sure this is working as we had hoped, but anyway...). So we submit an
LCG job, it starts running on the WN, and it knows:
1. The proxy expiry will (should) kill it after 2 or 3 days.
2. The queue published at least 2 days max CPU time.
3. The speed of the node based on a 60 second benchmark.
From this it figures out if it can complete the job on that node in the
time available to it. This makes the *big* assumption that the
conditions in place for the 60 second benchmark are the *slowest* the
node will run over the duration of the job. If it determines it cannot
complete, it will not continue.
If the CPU is later overloaded (either via HT or due to extra/niced
processes) then the job may run out the proxy certificate (remember,
they are usually 12h or 24h).
Cheers,
Ian.
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
|