On Thu, Jan 19, 2006 at 11:51:49AM -0000 or thereabouts, Steve Thorn wrote:
> Have a problem with Maui blocking jobs even though there are free CPUs.
> Anyone have any ideas?
Check that maui's view of the world is the same as pbs. i.e. compare
# diagnose -n
and
# pbsnodes -a
In particular you must restart maui if deleing or adding a node to PBS.
>
> cheers
> Steve
>
> # showq
> ACTIVE JOBS--------------------
> JOBNAME USERNAME STATE PROC REMAINING
> STARTTIME
>
> 35108 atlas004 Running 1 1:14:58:26 Wed Jan 18
> 02:43:57
> 35109 atlas004 Running 1 2:16:44:06 Thu Jan 19
> 04:29:37
> 34885 atlas004 Running 1 2:23:54:26 Thu Jan 19
> 11:39:57
>
> 3 Active Jobs 3 of 7 Processors Active (42.86%)
> 2 of 5 Nodes Active (40.00%)
>
> IDLE JOBS----------------------
> JOBNAME USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
>
> 0 Idle Jobs
>
> BLOCKED JOBS----------------
> JOBNAME USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
> 35589 lhcb003 Idle 1 3:00:00:00 Mon Jan 16
> 13:28:45
> 35590 lhcb003 Idle 1 3:00:00:00 Mon Jan 16
> 13:29:45
> 35591 lhcb003 Idle 1 3:00:00:00 Mon Jan 16
> 13:33:46
> 35592 lhcb003 Idle 1 3:00:00:00 Mon Jan 16
> 13:38:48
> <snip>
>
> Checkjob gives similar output for all blocked jobs I tried:
>
> # checkjob 35589
> checking job 35589
>
> State: Idle
> Creds: user:lhcb003 group:lhcb class:lhcb qos:DEFAULT
> WallTime: 00:00:00 of 3:00:00:00
> SubmitTime: Mon Jan 16 13:28:45
> (Time Queued Total: 2:22:18:48 Eligible: 00:00:00)
>
> StartDate: -00:03:03 Thu Jan 19 11:44:30
> Total Tasks: 1
>
> Req[0] TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Holds: Defer
> Messages: exceeds available partition procs
> PE: 1.00 StartPriority: 628
> cannot select job 35589 for partition DEFAULT (job hold active)
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|