Marteen,
Many thanks for thoses explanations and your documentationon this topic.
But, I have still a question about the cleanup step, because we have
currently a problem with odd gram_job_ state disappearing on our CE
(IN2P3-CC Site).
Indeed, we noticed that several submitted jobs are not anymore known by
our jobmanager. Taking a look at the jobmanager log file, these jobs
were no more handled by jobmanager at about 03:00 this morning, and it
was the same yesterday with other jobs. So, I suppose that it is at this
time that their gram_job_state file has disappeared.
However, these jobs are always known by our batch system, either running
jobs, or queued jobs. So the worst is that the running jobs indefinitely
hang on to a (RB ?) connexion.
So my question is:
Do you know this strange phenomenon ? Is that possible that a RB could
launch the cleanup step too soon on the CE ?
Thanks in advance for any possible explanation,
Cheers,
Pierre
Maarten Litmaath, CERN a écrit :
>On Fri, 11 Feb 2005, owen maroney wrote:
>
>
>
>>This is really useful: is there a page or three on this in the
>>troubleshooting wiki?
>>
>>
>
>Hi Owen,
>I have added entries to the Job Submission category:
>
> http://goc.grid.sinica.edu.tw/gocwiki/SiteProblemsFollowUpFaq
>
>In particular:
>
>http://goc.grid.sinica.edu.tw/gocwiki/Dialog_between_RB_and_CE
>
>http://goc.grid.sinica.edu.tw/gocwiki/Globus_error_79%3a_connecting_to_the_job_manager_failed%2e
>
>Cheers,
> Maarten
>
>
>
--
______________________
Pierre GIRARD
Grid Computing Team Member
IN2P3/CNRS Computing Centre - Lyon (FRANCE)
http://cc.in2p3.fr
Tel. +33 4.78.93.08.80 | Fax. +33 4.72.69.41.70 | e-mail: [log in to unmask]
|