On Sat, 10 Nov 2007, Adeel-ur-Rehman wrote:
> We have re-installed the pbs and torque rpms on our batch server and
> configured the node this time leaving the queues to its default
> configuration. But the behaviour of job execution seems to be same.
>
> An important note is that, the same job we submit (same .jdl and .sh file)
> gets sometimes stucked in the Running state while sometimes it gets executed
> successfully.
>
> Also, sometimes jobs stucked at the start after coming into the Running
> state, while sometimes it gets stucked after spending sometime in the
> Running state.
>
> A screenshot of two jobs stucked in the running state in the start is
> attached. If we observe such a situation even after hours, it remains the
> same as far as these jobs are concerned. Others jobs may enter and execute
> successfully or they also get into the same situation.
Could you have a network hardware problem, e.g. in a department router?
Or too strict firewall settings? Note that Torque uses TCP _and_ UDP.
|