Print

Print


Dear All,

Our CE running lcg-CE-3.1.37-0 had its site-info.def changed yesterday & 
yaim rerun. Since then it fails all OPS SAM tests being unable to map DN 
to opssgm account. Alice, LHCb & CMS SAM tests all ok though.

In /etc/grid-security grep -l opssgm * results in nothing whereas on 
sister CE (running 3.1.38 if it matters) it's found in several files.

The change to site-info.def was changing

QUEUE_GROUPS="
/dteam/ROLE=lcgadmin
/dteam/ROLE=production
dteam
/ops/ROLE=lcgadmin
/ops/ROLE=production
ops
/alice/ROLE=lcgadmin
/alice/ROLE=production
/alice/ROLE=pilot
alice
etcetc-for-other-VOs
"
QUEUES="long medium short express"
LONG_GROUP_ENABLE="$QUEUE_GROUPS"
MEDIUM_GROUP_ENABLE="$QUEUE_GROUPS"
SHORT_GROUP_ENABLE="$QUEUE_GROUPS"

to

TESTQUEUE_GROUPS="
/dteam/ROLE=lcgadmin
/dteam/ROLE=production
dteam
/ops/ROLE=lcgadmin
/ops/ROLE=production
ops
"
REALQUEUE_GROUPS="
/alice/ROLE=lcgadmin
/alice/ROLE=production
/alice/ROLE=pilot
alice
etcetc-for-other-VOs-except-ops+dteam
"
QUEUES="long medium short express"
QUEUE_GROUPS="${TESTQUEUE_GROUPS} ${REALQUEUE_GROUPS}"
SHORT_GROUP_ENABLE=$QUEUE_GROUPS
EXPRESS_GROUP_ENABLE=$QUEUE_GROUPS
# don't give ops access to long/medium = SAM jobs time out in queue
LONG_GROUP_ENABLE=$REALQUEUE_GROUPS
MEDIUM_GROUP_ENABLE=$REALQUEUE_GROUPS

Are "" really needed around $QUEUE_GROUPS in <queue>_GROUP_ENABLE?
Is the space in the new defined QUEUE_GROUPS likely the bug?
Is there some other bug there?
Having rerun yaim, qmgr now says ops only allowed to short & express.

The symptoms are in /var/log/globus-gatekeeper.log

lcas client name: /DC=ch/DC=cern/OU=Organic
Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
LCAS   0:       lcas_plugin_voms-plugin_confirm_authorization_from_x509():
Did not find a matching VO entry in the authorization file

Until finding the post
http://scotgrid.blogspot.com/2008/08/nanocmos-lcas-fail.html

So as they said, commenting out the lcas_voms.mod in /opt/glite/etc/lcas/lcas.db

Now the error in /var/log/globus-gatekeeper.log changes to 

LCMAPS 0: 2010-02-14.04:02:48.0000006664.0000000000 :
lcmaps_plugin_voms_localaccount-plugin_run(): Could not find a VOMS
localaccount in /etc/grid-security/grid-mapfile (failure)

Help! How does yaim make files in /etc/grid-security & 'lose' knowledge of 
opssgm ? (Afraid don't know much about this)

This CE shares a gridmapdir with its sister CE (different arch SL5 WN) via 
NFS so that pool accounts should get mapped the same on both clusters, but 
opssgm is not in there. Now, anyway. Sister CE passing all OPS SAM tests,
still has older form of site-info.def.

(PS both CEs actually in Scheduled MAINT ATM for HPC machine room Power
Maintenance) 

Very grateful for debug help.