Hallo Ralph,
> ve a strange problem here after the installation of a new kernel:
> incoming jobs are submitted several times to the batch system.
>
> A job is accepted by the gatekeeper and submitted to our torque.
> Immediatly after the submission the job is again submitted and so on
> untill there are 11 torque jobs.
>
> Another strange thing is that even if these 11 jobs are still waiting to
> be executed in torque (as the queues are full) WMS tells me that they
> were aborted. So it seems to me they somehow fail at once and thus are
> resubmitted by WMS.
The evidence simply suggests that each job immediately fails,
i.e. before the user payload gets started.
By default the WMS will do up to 10 shallow resubmissions then.
There are many possible causes outlined here:
http://goc.grid.sinica.edu.tw/gocwiki/Cannot_read_JobWrapper_output...
The dots are part of the URL.
|