On Thu, 19 Jan 2006, Steve Thorn wrote: > Checkjob gives similar output for all blocked jobs I tried: > > # checkjob 35589 > checking job 35589 > > State: Idle > Creds: user:lhcb003 group:lhcb class:lhcb qos:DEFAULT > WallTime: 00:00:00 of 3:00:00:00 > SubmitTime: Mon Jan 16 13:28:45 > (Time Queued Total: 2:22:18:48 Eligible: 00:00:00) > > StartDate: -00:03:03 Thu Jan 19 11:44:30 > Total Tasks: 1 > > Req[0] TaskCount: 1 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > > > IWD: [NONE] Executable: [NONE] > Bypass: 0 StartCount: 0 > PartitionMask: [ALL] > Holds: Defer > Messages: exceeds available partition procs > PE: 1.00 StartPriority: 628 > cannot select job 35589 for partition DEFAULT (job hold active) Clearly something is amiss here - I have sometimes been able to shed some light on similar sitiations with "qstat -f 35589". I have seen cases where jobs have been allocated to nodes that are down/offline/awaiting repair. In extremis, the 2 job files will have to be deleted from /var/spool/pbs/server_priv/jobs and the main pbs server restarted. -- David Martin Kelvin Building, University of Glasgow, Glasgow, G12 8QQ, United Kingdom tel: (0)141 330 4197 fax: (0)141 330 5881 email: [log in to unmask]