Hi,
We've been seeing a biomed user recently whose jobs run efficiently, but
until they exit, are reported as being hideously inefficient, making
detection and management of inefficient jobs harder to achieve.
>From what we can see, the jobs come in, and once running on a node fork
into two threads, one of which does the computation, the other
repeatedly sleeps for 10 minutes, writes information from a log file
into a MySQL database, and goes back to sleep.
If you do a qstat -f on the job, you will see that whilst the wallclock
increases during the run of the job, the cputime only increases upon the
completion of a process.
Typically each job has a couple of sequential runs of the
computationally intensive code, so the cpu time will jump throughout the
lifetime of the job.
e.g.
Wallclock (h) Cpu Time Efficiency
0 0 -
1 0 0
2 0 0
3 2:10 38%
4 2:10 29%
... ... ...
22 2:10 5%
23 2:10 5%
24 23:40 98%
The job then exited cleanly.
Is anyone else seeing this?
--
David Ambrose-Griffith - [log in to unmask]
IPPP, Department of Physics, Durham University,
Science Laboratories, South Road, Durham, DH1 3LE
Direct Dial: +44 (0)191 3343704
Office: +44 (0)191 334 3811
|