Print

Print


Hi Maarten 
Thanks, everything else is looking normal except no
/opt/globus/libexec/grid_monitor_lite.sh process starts at CE when I
submit job and /opt/globus/var/log/globus-gma.log is full of these
warnings
 
Fri Oct  2 16:05:04 2009:18593:WARN: Poll failed for job
https://ngsce-test.oerc
.ox.ac.uk:64007/15177/1254481706/
Fri Oct  2 16:05:04 2009:3873:WARN: Poll process terminated with error
for job h
ttps://ngsce-test.oerc.ox.ac.uk:64007/15177/1254481706/

I could not found that why /opt/globus/libexec/grid_monitor_lite is not
starting.

Regards
Kashif







-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]] On
Behalf Of Maarten Litmaath
Sent: 01 October 2009 18:38
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Job stay in running state

Hi Kashif,

> I have installed a lcg-CE fronting a tarball WN installation and using

> PBS as JOB_MANAGER. It is also using shared home directory of pool 
> accounts between CE and WN. I am submitting job using -r option. Job 
> is successfully submitting to CE and I can see from batch system that 
> it get completed.
> But when I check glite-wms-job-status, it keep showing 'running' until

> proxy get expired.
> This bit keeps repeating in /home/user/gram_job_mgr_*
> 
> 10/1 18:00:47 JMI: poll: seeking:
> https://ngsce-test.oerc.ox.ac.uk:64001/1937/12
> 54416006/
> 10/1 18:00:47 JMI: poll_fast: ******** Failed to find 
> https://ngsce-test.oerc.ox .ac.uk/1937/1254416006/
> 10/1 18:00:47 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl
> scripts)
> 10/1 18:00:47 JMI: cmd = poll
> 10/1 18:00:47 JMI: returning with success

Those errors look "normal".

> I have already checked that I can do globus-url-copy from WN pool 
> account to CE as well WMS.

With the standard jobmanager-pbs your WNs do not need access to the CE!

> Crl are up to date
> There is no issue of ssh key as home directory is shared.
> I am using GLOBUS_TCP_PORT_RANGE="64000,65256" for some reason.

Check here:

http://goc.grid.sinica.edu.tw/gocwiki/Jobs_sent_to_some_CE_stay_in_Runni
ng_state_forever

That page redirects here:

http://goc.grid.sinica.edu.tw/gocwiki/Jobs_sent_to_some_CE_stay_in_Sched
uled_state_forever