Hi,

 

When running qsub multiple times manually (or when qsub is run by CEs), occasionally I get:

 

qsub: Invalid credential

 

and in the log on the batch server is this:

 

02/08/2012 06:36:38;0080;PBS_Server;Req;req_reject;Reject reply code=15012(PBS_Server System error: Interrupted system call MSG=error reading unmunge data), aux=0, type=AlternateUserAuthentication, from [log in to unmask]

 

Similarly, worker nodes also randomly have the same problem:

 

02/08/2012 06:35:32;0080;PBS_Server;Req;req_reject;Reject reply code=15012(PBS_Server System error: Interrupted system call MSG=error reading unmunge data), aux=0, type=AlternateUserAuthentication, from [log in to unmask]

 

Is this a known or expected problem with torque 2.5.7-7? It's a UMD torque server currently with 112 glite 3.2 worker nodes, all with the same version of torque and munge 0.5.8-8.el5.

 

I'm just using the default munge configuration. Should I try increasing the number of munge threads on the torque server, or is that not likely to be the cause of the problem?

 

Regards,

Andrew.