On Mon, 4 Jun 2007, Nick Vidiadakis wrote:
> Good evening to all,
>
> For the last 3 days we've been getting js:FAILED errors in
> GR-03-HEPNTUA, although the site works OK up to now and we are full of
> jobs. At first we thought this could be due to the general instability
> of SAM, but after a little investigation, we found out that only DTEAM
> and OPS queues have problem and that's why we fail at SFT's. We even
> submitted jobs as see users and everything resulted OK. The last upgrade
> we made was at 30/05/2005 (Glite Update 23) and until 02/06/2007
> everything worked as clock. From that day and then, we have the problems
> described and even after a manual upgrade today, nothing is fixed.
>
> Does anyone have a similar behaviour? I am not quite sure this is a
> local problem of ours.
It is. From the JS logs on the SAM pages for your CE one see the reason
for the failures:
7 an authentication operation failed
The cause is an incomplete configuration of your CE.
In /opt/edg/etc/lcmaps/gridmapfile there is this line:
"/VO=ops/GROUP=/ops/ROLE=lcgadmin" .opssgm
But /etc/passwd and /etc/grid-security/gridmapdir do not list any "opssgm"
pool accounts.
So, first put a sufficient number of such accounts in your users.conf.
See this page for estimates per VO:
http://glite.web.cern.ch/glite/packages/R3.0/deployment/glite-known-issues.asp
Then rerun YAIM on your CE to configure the new accounts.
|