Dear All,
Our CE running lcg-CE-3.1.37-0 had its site-info.def changed yesterday &
yaim rerun. Since then it fails all OPS SAM tests being unable to map DN
to opssgm account. Alice, LHCb & CMS SAM tests all ok though.
In /etc/grid-security grep -l opssgm * results in nothing whereas on
sister CE (running 3.1.38 if it matters) it's found in several files.
The change to site-info.def was changing
QUEUE_GROUPS="
/dteam/ROLE=lcgadmin
/dteam/ROLE=production
dteam
/ops/ROLE=lcgadmin
/ops/ROLE=production
ops
/alice/ROLE=lcgadmin
/alice/ROLE=production
/alice/ROLE=pilot
alice
etcetc-for-other-VOs
"
QUEUES="long medium short express"
LONG_GROUP_ENABLE="$QUEUE_GROUPS"
MEDIUM_GROUP_ENABLE="$QUEUE_GROUPS"
SHORT_GROUP_ENABLE="$QUEUE_GROUPS"
to
TESTQUEUE_GROUPS="
/dteam/ROLE=lcgadmin
/dteam/ROLE=production
dteam
/ops/ROLE=lcgadmin
/ops/ROLE=production
ops
"
REALQUEUE_GROUPS="
/alice/ROLE=lcgadmin
/alice/ROLE=production
/alice/ROLE=pilot
alice
etcetc-for-other-VOs-except-ops+dteam
"
QUEUES="long medium short express"
QUEUE_GROUPS="${TESTQUEUE_GROUPS} ${REALQUEUE_GROUPS}"
SHORT_GROUP_ENABLE=$QUEUE_GROUPS
EXPRESS_GROUP_ENABLE=$QUEUE_GROUPS
# don't give ops access to long/medium = SAM jobs time out in queue
LONG_GROUP_ENABLE=$REALQUEUE_GROUPS
MEDIUM_GROUP_ENABLE=$REALQUEUE_GROUPS
Are "" really needed around $QUEUE_GROUPS in <queue>_GROUP_ENABLE?
Is the space in the new defined QUEUE_GROUPS likely the bug?
Is there some other bug there?
Having rerun yaim, qmgr now says ops only allowed to short & express.
The symptoms are in /var/log/globus-gatekeeper.log
lcas client name: /DC=ch/DC=cern/OU=Organic
Units/OU=Users/CN=samoper/CN=582979/CN=Judit Novak
LCAS 0: lcas_plugin_voms-plugin_confirm_authorization_from_x509():
Did not find a matching VO entry in the authorization file
Until finding the post
http://scotgrid.blogspot.com/2008/08/nanocmos-lcas-fail.html
So as they said, commenting out the lcas_voms.mod in /opt/glite/etc/lcas/lcas.db
Now the error in /var/log/globus-gatekeeper.log changes to
LCMAPS 0: 2010-02-14.04:02:48.0000006664.0000000000 :
lcmaps_plugin_voms_localaccount-plugin_run(): Could not find a VOMS
localaccount in /etc/grid-security/grid-mapfile (failure)
Help! How does yaim make files in /etc/grid-security & 'lose' knowledge of
opssgm ? (Afraid don't know much about this)
This CE shares a gridmapdir with its sister CE (different arch SL5 WN) via
NFS so that pool accounts should get mapped the same on both clusters, but
opssgm is not in there. Now, anyway. Sister CE passing all OPS SAM tests,
still has older form of site-info.def.
(PS both CEs actually in Scheduled MAINT ATM for HPC machine room Power
Maintenance)
Very grateful for debug help.
|