Hi, I'm getting intermittant job aborts with this error: Got a job held event, reason: Globus error 94: the jobmanager does not accept any new requests (shutting down) The GOC Wiki suggests that the most likely cause of this is a problem in the batch system, either the CE cannot submit the job or fails to track it properly. Since it is only intermittant I am guess it is not a gerneral configuration problem. Looking at the batch system accounting logs I can see the jobs being submitted fine but then something on the CE is deleteing them before they get chance to run: 05/30/2008 11:37:51;Q;1578437.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:37:54;S;1578437.heplnx201.pp.rl.ac.uk;user=dteam003 group=dteam jobname=STDIN queue=dteam ctime=1212143871 qtime=1212143871 etime=1212143871 start=1212143874 exec_host=heplnc109.pp.rl.ac.uk/0 Resource_List.cput=24:00:00 Resource_List.mem=985mb Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=24:00:00 05/30/2008 11:38:00;Q;1578438.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:38:01;Q;1578439.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:38:51;Q;1578440.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:38:51;D;1578436.heplnx201.pp.rl.ac.uk;[log in to unmask] .rl.ac.uk 05/30/2008 11:38:52;Q;1578441.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:39:00;D;1578438.heplnx201.pp.rl.ac.uk;[log in to unmask] .rl.ac.uk 05/30/2008 11:39:01;Q;1578442.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:39:01;Q;1578443.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:39:01;D;1578439.heplnx201.pp.rl.ac.uk;[log in to unmask] .rl.ac.uk 05/30/2008 11:39:02;Q;1578444.heplnx201.pp.rl.ac.uk;queue=dteam 05/30/2008 11:39:03;E;1578437.heplnx201.pp.rl.ac.uk;user=dteam003 group=dteam jobname=STDIN queue=dteam ctime=1212143871 qtime=1212143871 etime=1212143871 start=1212143874 exec_host=heplnc109.pp.rl.ac.uk/0 Resource_List.cput=24:00:00 Resource_List.mem=985mb Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=24:00:00 session=15554 end=1212143943 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=17324kb resources_used.vmem=695080kb resources_used.walltime=00:01:13 So my question is what is deleting the jobs and how can I find out what is cauing it to do that? I can manually run, showq, qstat, qstat -f etc. multiple times manually without any failure or long delays on returning output and the load on the CEs and Torque server is low. Any help appreciated. Thanks, Chris.