Sam Skipsey wrote:
After our site transitioned to a new SGE qmaster (in fact, two of them,
for improved failover), we seem to have moved to a new set of failures
on the SL4 CE with lcgsge:
Now the SAM tests are dying of proxy-related errors - "Got a job held
event, reason: Proxy file missing or corrupted" being the most recent one.
The internet and wiki are, as usual with my problems, being utterly
useless in providing any particular help with this problem.
Does anyone have any idea why these proxy errors happen? The grid
certificates are up-to-date, since I yumed them appropriately...
I know that there's definitely qsubs being attempted, because before we
properly transitioned to the new qmaster, there were errors generated in
the /qmaster/ logs due to incorrect requests being made. These errors
are no longer generated in those logs...
Sam
|