On Fri, Dec 16, 2005 at 11:04:10AM +0100 or thereabouts, Jeff Templon wrote:
> Yo,
>
> So we just upgraded to torque 2.0.0 and now don't get any jobs.
> Investigation shows that this comes from publishing a rather hefty ERT
> value despite the fact we are almost empty.
>
> The relevant piece of code in lcg-info-dynamic-pbs:
>
> my $TCPU = ( $MaxRunningJobs < $TotalCPU )? $MaxRunningJobs : $TotalCPU;
> $MaxTime=(($TotalJobs * $WallTime) - $UsedTime) / $TCPU;
> if ( $MaxTime < 0){
> $MaxTime=99999999;
> }
>
> my congratulations to whomever wrote this, I can read it despite the
> fact that it is Perl. anyway the bottom line is that the ERT that is
> being published is half the value of MaxTime.
>
> I wonder why this ever gave the right answer, because looking at the
> calculation it is basing things on the *total* number of jobs. I had
> always thought (and the behavior of the numbers seemed to support this)
> that QueuedJobs was being used, not TotalJobs. Is there some bug in
> earlier versions of torque that make TotalJobs give you the right
> answer?? I thought we had always printed an ERT of zero unless there
> were actually jobs waiting in the queue.
The logic has allways been the same in 2.6.0 where as you say ERT!=0
when queued jobs = 0.
>
> I am going to make the change to QueuedJobs by hand here until I hear
> something different. Hmm, on second thought that is even worse, since
> MaxTime will be less than zero, so all ERTs will be huge.
>
>
> J "how was this ever supposed to work???" T
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|