Hi Stephen,
I see you have correctly diagnosed the problem with the current
information provider: it's written in Perl.
Thanks,
JT
On Mon, 2005-01-10 at 17:57, Burke, S (Stephen) wrote:
> LHC Computer Grid - Rollout
> > [mailto:[log in to unmask]] On Behalf Of Peter Love said:
> > What *is* the actual algorithm used where does it get its info? At
> > lunegw.lancs.ac.uk we have 24 job slots, currently 23 are
> > being used by
> > the infinite queue and 1 by the long. No stale jobs are present.
>
> As far as I can see the algorithm is just:
>
> (($TotalJobs * $WallTime) - $UsedTime) / 2*$TotalCPU;
>
> To be negative I guess the UsedTime, which should presumably be the total
> time used by all running jobs, is bigger that the total number of jobs * the
> wall time limit. Those are picked up using qstat on each queue. TotalJobs is
> the total number of lines which start "Job Id:", WallTime is
> resources_max.walltime (apparently with no default if it isn't set?), and it
> looks to me like the bug is in UsedTime:
>
> open QSTAT, "qstat -f $queue\@$pbsHost 2>&1 |" or die "Error running qstat.
> (file)\n";
>
> while(<QSTAT>) {
> if (/^Job Id:/){
> $TotalJobs=$TotalJobs+1;
> }
> if (/job_state = Q/){
> $QueuedJobs=$QueuedJobs+1;
> }
> if (/job_state = R/){
> $RunningJobs=$RunningJobs+1;
> }
> if (/^\s+resources_used.walltime\s+=\s+(\S+)/){
> $UsedTime=$UsedTime . int(&convertHhMmSs($1)/60);
> }
>
> }
>
> Unless I'm missing something that . in the last line should be a +,
> concatenating all the times as strings is likely to produce a time which is
> slightly too big ... strong typing sometimes has its advantages!
>
> Stephen
|