LHC Computer Grid - Rollout
> [mailto:[log in to unmask]] On Behalf Of Peter Love said:
> What *is* the actual algorithm used where does it get its info? At
> lunegw.lancs.ac.uk we have 24 job slots, currently 23 are
> being used by
> the infinite queue and 1 by the long. No stale jobs are present.
As far as I can see the algorithm is just:
(($TotalJobs * $WallTime) - $UsedTime) / 2*$TotalCPU;
To be negative I guess the UsedTime, which should presumably be the total
time used by all running jobs, is bigger that the total number of jobs * the
wall time limit. Those are picked up using qstat on each queue. TotalJobs is
the total number of lines which start "Job Id:", WallTime is
resources_max.walltime (apparently with no default if it isn't set?), and it
looks to me like the bug is in UsedTime:
open QSTAT, "qstat -f $queue\@$pbsHost 2>&1 |" or die "Error running qstat.
(file)\n";
while(<QSTAT>) {
if (/^Job Id:/){
$TotalJobs=$TotalJobs+1;
}
if (/job_state = Q/){
$QueuedJobs=$QueuedJobs+1;
}
if (/job_state = R/){
$RunningJobs=$RunningJobs+1;
}
if (/^\s+resources_used.walltime\s+=\s+(\S+)/){
$UsedTime=$UsedTime . int(&convertHhMmSs($1)/60);
}
}
Unless I'm missing something that . in the last line should be a +,
concatenating all the times as strings is likely to produce a time which is
slightly too big ... strong typing sometimes has its advantages!
Stephen
|