Hi Tomas,
> > Can you increase the debug level in
> > /opt/globus/etc/globus-job-manager-marshal.conf to 2 and send a SIGHUP
> > to the globus-job-manager-marshal master process?
Can you restart that process, as explained by Andrey Kiryanov?
Then there should be a lot more stuff logged, including perl warnings
that can be ignored to a large extent.
> > Then look into /opt/globus/var/log/globus-job-manager-marshal.log
> > for additional messages/warnings/errors from the job manager.
>
> I have done so and checked the log and straced the globus-job-manager-marshal.
> It helped to orientate myself in logs. I think the problem shows up in
> gram_job_mgr_<ID>.log:
>
> Thu Jun 19 09:29:17 2008 JM_SCRIPT: New Perl JobManager created.
> Thu Jun 19 09:29:17 2008 JM_SCRIPT: Using jm supplied job dir:
> /home/dteam001/.globus/job/ce2.egee.cesnet.cz/1669.1213860305
> Thu Jun 19 09:29:17 2008 JM_SCRIPT: polling job 1684
> 6/19 09:29:17 JMI: while return_buf = GRAM_SCRIPT_JOB_STATE = 2
> 6/19 09:29:17 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL1
> 6/19 09:29:27 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_POLL2
> 6/19 09:29:27 JMI: testing job manager scripts for type fork exist and
> permissions are ok.
> 6/19 09:29:27 JMI: completed script validation: job manager type is fork.
> 6/19 09:29:27 JMI: in globus_gram_job_manager_poll()
> 6/19 09:29:27 JMI: local stdout filename =
> /home/dteam001/.globus/job/ce2.egee.cesnet.cz/1669.1213860305/stdout.
> 6/19 09:29:27 JMI: local stderr filename = /dev/null.
> 6/19 09:29:27 JMI: poll: seeking:
> https://ce2.egee.cesnet.cz:20002/1669/1213860305/
> 6/19 09:29:27 JMI: poll_fast: ******** Failed to find
> https://ce2.egee.cesnet.cz/1669/1213860305/
> 6/19 09:29:27 JMI: poll_fast: returning -1 = GLOBUS_FAILURE (try Perl scripts)
> 6/19 09:29:27 JMI: cmd = poll
> 6/19 09:29:27 JMI: returning with success
>
> This snippet keeps repeating in the log every 10 seconds.
I think this is "normal".
> The grid_manager_monitor_agent_log really does not contain mentioned string,
> its content:
>
> 1213863891 1213863891
> https://ce2.egee.cesnet.cz:20005/30388/1213703208/ 1
> https://ce2.egee.cesnet.cz:20007/20753/1213692787/ 1
> https://ce2.egee.cesnet.cz:20008/25851/1213694320/ 1
> GRIDMONEOF
Those are GRAM contact strings for unfinished jobs and their states.
State 1 means PENDING:
http://pages.cs.wisc.edu/~adesmet/status.html
> > If that does not provide more clues, you can do the same with
> > /opt/globus/etc/globus-gass-cache-marshal.conf and the
> > globus-gass-cache-marshal master process, then look into
> > /opt/globus/var/log/globus-gass-cache-marshal.log.
Did you have a look there?
|