On Thu, Aug 11, 2005 at 11:41:26AM +0100 or thereabouts, William Hay wrote:
> Hi,
> we've had a user contact us because his jobs were aborted on our cluster.
> What appears to be the problem is that although the cluster was setup
> with a maximum wall clock time of 72 hours the limit on cpu time was 48
> hours. The accounting file shows the relevant jobs terminating with an
> Error_Status of 271, Resource_list.cput of 48:00:00 and Resource_used.cput
> a little higher
>
> Since the amount of time available is advertised for each queue is
> adverised via:
>
> GlueCEPolicyMaxCPUTime: 2880
>
> This is presumably requestable but is there any reason why it should be
> much lower than the wall_clock time? These values appear to be hardcoded
> in YAIM's config_torque_server so presumably someone put some thought
> into the choice.
>
> As we are planning to upgrade to 2.6.0 next week is there any reason
> not to override the defaults when we upgrade and make cpu and wall clock
> time limits the same.
Certainly you can set what ever time limits you want. You probably
want to have walltime larger than cputime to some extent to allow
people to do i/o in their job and still get the expected cputime. If I
had my way I would remove cputime from everything in the whole world.
Steve
>
>
> William Hay, UCL-CCC System Administrator
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|