Yesterday, we were frequently failing the SAM CE tests, although non-OPS
VO jobs seemed to be OK.
After following the discussions in the news groups, I reached the
conclusion that the problem was
due to the lack of sgm and prd accounts. Therefore I added lines of the
form
50601:opssgm001:1932:ops:ops:sgm:
51301:opsprd001:1932:ops:ops:prd:
to users.conf for each VO, and then ran configure_node on the CE and all
WNs. Now we are failing ALL our SAM CE tests.
I reversed the change by deleting the new accounts from user.conf and
running configure_node on the CE and WNs again,
but we are still failing ALL the tests. I don't see anything wrong in
the globus-gatekeeper logs, I can su to ops001 and prove
that ssh between the WN nodes is OK, and I can submit jobs internally
with qsub.
Any ideas anyone on how to debug this?
Thanks in advance
Dave
|