On Thu, 13 Feb 2003, Ian Stokes-Rees wrote:
> Is it not reasonable to think that the typical EDG data intensive jobs will
> not be able to make 100% use of the CPU? If this is the case, then is it
> not desirable to have a couple of jobs running concurrently on each
> processor, in order to make use of disk blocking time?
This really depends on the amount of physical memory that the jobs need
while running. If the nodes have not got X hundred MB per job, then they
will spend time swapping back and forth between memory and disk to change
between processes.
> Specifically, would it not be reasonable to have PBS concurrently schedule
> two jobs per CPU for disk intensive jobs? Obviously the overall runtime of
> each job would be increased, but is it not fair to expect that the _overall_
> execution time would be sped up from 10-30% (or more, if we get to a stage
> where data is being fetched either across a local network, or across the
> internet)?
The only real way to know is to try it with some real jobs. For instance,
watch the output of top while they are running: if the total CPU usage of
two data-limited jobs is lower than for one data-limited job, then it's
less efficient overall (because it has to keep waiting for swapping of
virtual memory in and out.)
Ideally, you want jobs to be CPU bound, because this means you've solved
any data-rate bottlenecks. The lower the total CPU usage for all jobs, the
worse you're doing.
(I think the long term solution is something like www.openmosix.org which
can dynamically move processes between nodes to optimise performance
based on the current CPU and disk usage patterns, but that's another
story.)
Cheers,
Andrew
------------------------------------------------------------------------
[log in to unmask] http://www.hep.man.ac.uk/~mcnab/ +44-161-275-4227
"/C=UK/O=eScience/OU=Manchester/L=HEP/CN=Andrew McNab"
Grid Research, High Energy Physics Group, University of Manchester, UK
------------------------------------------------------------------------
|