

When running qsub multiple times manually (or when qsub is run by CEs), occasionally I get:


qsub: Invalid credential


and in the log on the batch server is this:


02/08/2012 06:36:38;0080;PBS_Server;Req;req_reject;Reject reply code=15012(PBS_Server System error: Interrupted system call MSG=error reading unmunge data), aux=0, type=AlternateUserAuthentication, from [log in to unmask]


Similarly, worker nodes also randomly have the same problem:


02/08/2012 06:35:32;0080;PBS_Server;Req;req_reject;Reject reply code=15012(PBS_Server System error: Interrupted system call MSG=error reading unmunge data), aux=0, type=AlternateUserAuthentication, from [log in to unmask]


Is this a known or expected problem with torque 2.5.7-7? It's a UMD torque server currently with 112 glite 3.2 worker nodes, all with the same version of torque and munge 0.5.8-8.el5.


I'm just using the default munge configuration. Should I try increasing the number of munge threads on the torque server, or is that not likely to be the cause of the problem?


