Vrijaldenhoven, Serge wrote:
> Hi Oscar (et.al),
>
> On our CE:
> /opt/etc/lcmaps/gridmapfile: non-existent
> /opt/glite/etc/lcmaps/gridmapfile: non-existent
>
> /etc/grid-security/grid-mapfile (after deleting the cron entry that rewrote it):
That entry is created when you first configure the node as an LCG-CE
and then as a gLite CE: unfortunately YAIM is bad at removing stuff
that should no longer be there... :-(
> --------------------------------
> "/phicos/Role=lcgadmin/Capability=NULL" .phicossgm
> "/phicos/Role=lcgadmin" .phicossgm
> "/phicos/Role=production/Capability=NULL" .phicosprd
> "/phicos/Role=production" .phicosprd
> "/phicos/Role=NULL/Capability=NULL" .phico
> "/phicos" .phico
> --------------------------------
>
> /opt/edg/etc/lcmaps/gridmapfile:
That file is irrelevant when you properly configure the gLite CE.
It currently uses /etc/grid-security/grid-mapfile and
/etc/grid-security/groupmapfile, nothing else.
> --------------------------------
> "/VO=phicos/GROUP=/phicos/ROLE=lcgadmin/Capability=NULL" .phicossgm
> "/VO=phicos/GROUP=/phicos/ROLE=lcgadmin" .phicossgm
> "/VO=phicos/GROUP=/phicos/ROLE=production/Capability=NULL" .phicosprd
> "/VO=phicos/GROUP=/phicos/ROLE=production" .phicosprd
> "/VO=phicos/GROUP=/phicos/Role=NULL/Capability=NULL" .phico
> "/VO=phicos/GROUP=/phicos" .phico
> ...
> --------------------------------
>
> So /etc/grid-security/grid-mapfile seems like the correct one, adjusted lcmaps.db to:
> vomslocalaccount = "lcmaps_voms_localaccount.mod"
> " -gridmapfile /etc/grid-security/grid-mapfile"
> " -use_voms_gid"
>
> vomspoolaccount = "lcmaps_voms_poolaccount.mod"
> " -gridmapfile /etc/grid-security/grid-mapfile"
> " -gridmapdir /etc/grid-security/gridmapdir"
> " -override_inconsistency"
>
> ps: I have no idea what/who created the lcmaps.db file originally (some time autogenerated by yaim?)
Yes, and there should be no reason to adjust those files manually.
You need to ensure your groups.conf is correct, that is all.
> Now all lcas and lcmaps stuff succeeds (thanks for the tips!), however, still there is something going wrong:
>
> on CE /var/log/glite/gatekeeper.log:
> --------------------------------
> Notice: 6: Got connection <IP of WMSLB> at Tue Jun 12 10:24:57 2007
> Notice: 5: Trying to use original user proxy ...
> LCAS stuff...
> LCMAPS stuff...
> Notice: 5: Requested service: jobmanager [PING ONLY]
> Notice: 5: Authorized as local user: phico008
> Notice: 5: Authorized as local uid: 16808
> Notice: 5: and local gid: 16800
> Notice: 5: "/O=dutchgrid/O=users/O=philips-natlab/CN=Serge Vrijaldenhoven" mapped to phico008 (16808/16800)
> Failure: ping successful
> Failure: ping successful
>
> Notice: 6: Got connection <IP of WMSLB> at Tue Jun 12 10:24:57 2007
> Notice: 5: Trying to use delegated user proxy
> LCAS stuff...
> LCMAPS stuff...
> Notice: 5: Requested service: jobmanager-fork
> Notice: 5: Authorized as local user: phico008
> Notice: 5: Authorized as local uid: 16808
> Notice: 5: and local gid: 16800
> Notice: 5: "/O=dutchgrid/O=users/O=philips-natlab/CN=Serge Vrijaldenhoven" mapped to phico008 (16808/16800)
> Notice: 0: executing /opt/globus/libexec/globus-job-manager
> Notice: 0: GATEKEEPER_JM_ID 2007-06-12.10:24:58.0000021565.0000000022 for /O=dutchgrid/O=users/O=philips-natlab/CN=Serge Vrijaldenhoven on <IP of WMSLB>
> Notice: 0: GRID_SECURITY_CONTEXT_FD=12
> Notice: 0: Child 31517 started
> JMA 2007/06/12 10:25:00 GATEKEEPER_JM_ID 2007-06-12.10:24:58.0000021565.0000000022 for /O=dutchgrid/O=users/O=philips-natlab/CN=Serge Vrijaldenhoven on <IP of WMSLB>
> JMA 2007/06/12 10:25:00 GATEKEEPER_JM_ID 2007-06-12.10:24:58.0000021565.0000000022 mapped to phico008 (16808, 16800)
> JMA 2007/06/12 10:25:00 GATEKEEPER_JM_ID 2007-06-12.10:24:58.0000021565.0000000022 has GRAM_SCRIPT_JOB_ID 31577 manager type fork
> JMA 2007/06/12 10:25:02 GATEKEEPER_JM_ID 2007-06-12.10:24:58.0000021565.0000000022 JM exiting
>
> Notice: 6: Got connection <IP of WMSLB> at Tue Jun 12 10:25:02 2007
> ... etc
> --------------------------------
>
> My guess is that something is wrong with the jobmanagers setup.
> globus-job-run and globus-job-submit via jobmanager-fork on the CE both work.
>
> More input/ideas on where to look/what can be wrong are very welcome.
I will have a look. Note that the gLite CE only uses the fork job manager
to launch Condor-C for the DN-FQAN combination, plus the grid_monitor,
which is not doing anything on a gLite CE.
Here is a visualization:
http://litmaath.home.cern.ch/litmaath/UI-WMS-CE-WN/
The ".fig" file is the source (XFIG).
It went through a couple of iterations of comments from the
developers, so it should be fairly accurate by now.
The arrows denote the flows of control: who connects to whom,
who starts what, who writes/reads where.
When a box has rounded corners, it means there is just one instance
of it handling all jobs; a pointy box has an instance per DN-FQAN
combination; a rectangular box has an instance per job.
Eventually there should be an accompanying text document
that describes all the steps, connections etc.
|