Hi Maarten
Thanks for looking into it.I have fixed the internal network address
problem and I have updated lcg-CE to 3.1.35-0 but still facing the same
problem. Globus-gma.log file is full of this
https://ngsce-test.oerc.ox.ac.uk:64004/6465/1251116035/
Mon Oct 5 14:24:12 2009:13965:WARN: Poll failed for job
https://ngsce-test.oerc
.ox.ac.uk:64007/15177/1254481706/
Mon Oct 5 14:24:12 2009:19030:WARN: Poll process terminated with error
for job
https://ngsce-test.oerc.ox.ac.uk:64007/15177/1254481706/
There is another ce ngsce.oerc.ox.ac.uk which is feeding to same cluster
and working fine. The only differnce is, that is using lcg-CE 3.1.21-0.
Regards
Kashif
-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On
Behalf Of Maarten Litmaath
Sent: 02 October 2009 16:34
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Job stay in running state
Hi Kashif,
> Thanks, everything else is looking normal except no
> /opt/globus/libexec/grid_monitor_lite.sh process starts at CE when I
> submit job and /opt/globus/var/log/globus-gma.log is full of these
> warnings
>
> Fri Oct 2 16:05:04 2009:18593:WARN: Poll failed for job
> https://ngsce-test.oerc .ox.ac.uk:64007/15177/1254481706/ Fri Oct 2
> 16:05:04 2009:3873:WARN: Poll process terminated with error for job h
> ttps://ngsce-test.oerc.ox.ac.uk:64007/15177/1254481706/
Those errors look real, but should not be due to the absence of
grid_monitor_lite processes... See below.
> I could not found that why /opt/globus/libexec/grid_monitor_lite is
> not starting.
Is your CE at the latest version? It fixes a bug in that area:
restart the globus-gatekeeper so that it picks up the environment
variable GLOBUS_GMA=true
It also has important fixes for other bugs.
I tried to have a look, but:
------------------------------------------------------------------------
-----
$ globus-job-run ngsce-test.oerc.ox.ac.uk /bin/rpm -q lcg-CE GRAM Job
submission failed because the job manager failed to open stderr (error
code 74)
------------------------------------------------------------------------
-----
$ uberftp ngsce-test.oerc.ox.ac.uk 'dir /'
220 ngsce-test.oerc.ox.ac.uk GridFTP Server 2.3 (gcc32dbg,
1144436882-63) ready.
230 User egop010 logged in.
Could not list /: Timeout waiting for server response.
Closing connection to service.
------------------------------------------------------------------------
-----
$ globus-url-copy -dbg -vb file:/etc/group
gsiftp://ngsce-test.oerc.ox.ac.uk/tmp/test.$$
[...]
debug: response from gsiftp://ngsce-test.oerc.ox.ac.uk/tmp/test.521:
227 Entering Passive Mode (10,141,245,14,250,6) [...]
------------------------------------------------------------------------
-----
On your CE the IP address for ngsce-test is returned as 10.141.245.14,
which is an internal network address! You need to fix that first.
|