LHC Computer Grid - Rollout
> [mailto:[log in to unmask]] On Behalf Of Sotomayor, Maniel
said:
> I'm trying to delete an old job from OpenPBS. It has
> been running
> for a long time and it seems to be stalled. Thousands of jobs
> are enqueued
> and stopped because it seems that OpenPBS is trying to drain
> the server.
Your system doesn't seem very happy, I tried to have a look but:
globus-job-run ce.prd.hp.com /bin/pwd
GRAM Job submission failed because the job manager failed to open stderr
(error code 74)
> bl-wn17
> Not Running: Draining system to allow starving job to run
I think that means that a job is asking for some resource that can't be
satisfied, for example an MPI job asking for more processors than are
currently free - after a while PBS stops letting new jobs start until that
job can run (which might be never!).
Stephen
|