Hi all,
thanks for all your suggestions/help. However, we have decided to migrate back to lcg-CE_torque. In the future we might try to get glite-CE working (on a fresh new node). For now, we migrate the node back.
Grtz,
Serge
> -----Original Message-----
> From: Maarten Litmaath [mailto:[log in to unmask]]
> Sent: Tuesday 12 June 2007 17:59
> To: Vrijaldenhoven, Serge
> Cc: LHC Computer Grid - Rollout
> Subject: Re: [LCG-ROLLOUT] LCAS LCMAPS working now but jobs
> still won't run
>
> Vrijaldenhoven, Serge wrote:
>
> > on CE /var/log/glite/gatekeeper.log:
> > --------------------------------
> > Notice: 6: Got connection <IP of WMSLB> at Tue Jun 12 10:24:57 2007
> > Notice: 5: Trying to use original user proxy ...
> > LCAS stuff...
> > LCMAPS stuff...
> > Notice: 5: Requested service: jobmanager [PING ONLY]
> > Notice: 5: Authorized as local user: phico008
> > Notice: 5: Authorized as local uid: 16808
> > Notice: 5: and local gid: 16800
> > Notice: 5: "/O=dutchgrid/O=users/O=philips-natlab/CN=Serge
> Vrijaldenhoven" mapped to phico008 (16808/16800)
> > Failure: ping successful
> > Failure: ping successful
> >
> > Notice: 6: Got connection <IP of WMSLB> at Tue Jun 12 10:24:57 2007
> > Notice: 5: Trying to use delegated user proxy
> > LCAS stuff...
> > LCMAPS stuff...
> > Notice: 5: Requested service: jobmanager-fork
> > Notice: 5: Authorized as local user: phico008
> > Notice: 5: Authorized as local uid: 16808
> > Notice: 5: and local gid: 16800
> > Notice: 5: "/O=dutchgrid/O=users/O=philips-natlab/CN=Serge
> Vrijaldenhoven" mapped to phico008 (16808/16800)
> > Notice: 0: executing /opt/globus/libexec/globus-job-manager
> > Notice: 0: GATEKEEPER_JM_ID
> 2007-06-12.10:24:58.0000021565.0000000022 for
> /O=dutchgrid/O=users/O=philips-natlab/CN=Serge Vrijaldenhoven
> on <IP of WMSLB>
> > Notice: 0: GRID_SECURITY_CONTEXT_FD=12
> > Notice: 0: Child 31517 started
> > JMA 2007/06/12 10:25:00 GATEKEEPER_JM_ID
> 2007-06-12.10:24:58.0000021565.0000000022 for
> /O=dutchgrid/O=users/O=philips-natlab/CN=Serge Vrijaldenhoven
> on <IP of WMSLB>
> > JMA 2007/06/12 10:25:00 GATEKEEPER_JM_ID
> 2007-06-12.10:24:58.0000021565.0000000022 mapped to phico008
> (16808, 16800)
> > JMA 2007/06/12 10:25:00 GATEKEEPER_JM_ID
> 2007-06-12.10:24:58.0000021565.0000000022 has
> GRAM_SCRIPT_JOB_ID 31577 manager type fork
> > JMA 2007/06/12 10:25:02 GATEKEEPER_JM_ID
> 2007-06-12.10:24:58.0000021565.0000000022 JM exiting
>
> So, the fork job manager is exiting immediately, while it should
> stay around for 1 hour for the grid_monitor and forever for the
> Condor-C instance for the DN-FQAN combination.
>
> Note that the "true" gLite CE will have a static Condor-C instance
> per VO, started at boot time. The current state of affairs is just
> a first approximation to be able to exercise the Condor-C path.
> The true gLite CE will not use the fork job manager.
> It is expected to be ready for release in a few months.
>
> Meanwhile we have to debug the Globus issues as usual...
> Have a look at the gram_job_mgr_*.log files in the pool account
> home directory: any suspicious complaints?
>
> Do /opt/globus/tmp and /opt/globus/tmp/gram_job_state have contents
> that look normal:
>
> [...]
> -rw-r--r-- 1 18901 2688 33 Apr 28 05:57
> grid_manager_monitor_agent_log.18901
> -rw-r--r-- 1 18901 2688 0 Apr 28 05:57
> grid_manager_monitor_agent_log.18901.time
> -rw-r--r-- 1 18941 1395 33 May 30 16:26
> grid_manager_monitor_agent_log.18941
> -rw-r--r-- 1 18941 1395 0 May 30 16:26
> grid_manager_monitor_agent_log.18941.time
> -rw-r--r-- 1 18943 1307 33 May 30 17:39
> grid_manager_monitor_agent_log.18943
> -rw-r--r-- 1 18943 1307 0 May 30 17:39
> grid_manager_monitor_agent_log.18943.time
> [...]
>
> And:
>
> [...]
> -rw-r--r-- 1 18673 2688 33 Dec 14 01:42
> grid_manager_monitor_agent_log.18673
> -rw-r--r-- 1 18673 2688 17 Dec 14 01:42
> grid_manager_monitor_agent_log.18673.lock
> -rw-r--r-- 1 18702 2688 33 Mar 19 20:36
> grid_manager_monitor_agent_log.18702
> -rw-r--r-- 1 18702 2688 17 Mar 19 20:37
> grid_manager_monitor_agent_log.18702.lock
> -rw-r--r-- 1 18738 2688 33 Feb 15 05:27
> grid_manager_monitor_agent_log.18738
> -rw-r--r-- 1 18738 2688 17 Feb 15 05:28
> grid_manager_monitor_agent_log.18738.lock
> [...]
>
> Note that the number in each file name _must_ correspond to the
> UID of its owner.
>
|