On Mon, 29 Sep 2008, David Robson wrote:
> We've just recently had a long period where all our PBS jobs queue,
> and then, one minute later, dequeue.
> Does anyone know of reasons why only jobs coming from the gatekeeper immediately
> get dequeued?? Can anyone suggest any debugging techniques to get to the bottom
> of this?
Could be a ropy ("Black Hole") worker node accepting and trashing fresh
Our PBS tends to fill jobs in from the highest numbered node. If things
are working properly now and you have some free nodes, try taking them
off-line and see if things keep working after the queues fill up a bit.
Also check the usual suspects - NTP, SSH/scp within the cluster, free disk
Dr. Henry Nebrensky [log in to unmask]
"The opossum is a very sophisticated animal.
It doesn't even get up until 5 or 6 p.m."