Before Stephen's modification to /opt/lcg/libexec/lcg-info-dynamic-pbs:
[root@lunegw root]# ldapsearch -x -H ldap://lunegw.lancs.ac.uk:2135 -b mds-vo-name=LANCS,o=grid|grep Estimat
GlueCEStateEstimatedResponseTime: -42945862397883
GlueCEStateEstimatedResponseTime: -5367
GlueCEStateEstimatedResponseTime: 0
GlueCEStateEstimatedResponseTime: 750
After patch (changing . to +):
[root@lunegw root]# ldapsearch -x -H ldap://lunegw.lancs.ac.uk:2135 -b mds-vo-name=LANCS,o=grid|grep Estimat
GlueCEStateEstimatedResponseTime: 26923
GlueCEStateEstimatedResponseTime: 3582
GlueCEStateEstimatedResponseTime: 0
GlueCEStateEstimatedResponseTime: 0
[root@lunegw root]# rpm -qf /opt/lcg/libexec/lcg-info-dynamic-pbs
lcg-info-dynamic-pbs-1.0.2-1
Peter
Burke, S (Stephen) ([log in to unmask]) wrote:
> LHC Computer Grid - Rollout
> > [mailto:[log in to unmask]] On Behalf Of Peter Love said:
> > What *is* the actual algorithm used where does it get its info? At
> > lunegw.lancs.ac.uk we have 24 job slots, currently 23 are
> > being used by
> > the infinite queue and 1 by the long. No stale jobs are present.
>
> As far as I can see the algorithm is just:
>
> (($TotalJobs * $WallTime) - $UsedTime) / 2*$TotalCPU;
>
> To be negative I guess the UsedTime, which should presumably be the total
> time used by all running jobs, is bigger that the total number of jobs * the
> wall time limit. Those are picked up using qstat on each queue. TotalJobs is
> the total number of lines which start "Job Id:", WallTime is
> resources_max.walltime (apparently with no default if it isn't set?), and it
> looks to me like the bug is in UsedTime:
>
> open QSTAT, "qstat -f $queue\@$pbsHost 2>&1 |" or die "Error running qstat.
> (file)\n";
>
> while(<QSTAT>) {
> if (/^Job Id:/){
> $TotalJobs=$TotalJobs+1;
> }
> if (/job_state = Q/){
> $QueuedJobs=$QueuedJobs+1;
> }
> if (/job_state = R/){
> $RunningJobs=$RunningJobs+1;
> }
> if (/^\s+resources_used.walltime\s+=\s+(\S+)/){
> $UsedTime=$UsedTime . int(&convertHhMmSs($1)/60);
> }
>
> }
>
> Unless I'm missing something that . in the last line should be a +,
> concatenating all the times as strings is likely to produce a time which is
> slightly too big ... strong typing sometimes has its advantages!
>
> Stephen
|