On Sun, 6 Feb 2005, Anar Manafov wrote:
> Maarten, if I may, I have to ask some additional questions.
>
> As far as I can see from "submit-helper.pl", it is required to have
> globus-url-copy by the path "GLOBUS_LOCATION"/bin/globus-url-copy, otherwise
> it is exiting! This is perfectly OK for us, to have globus-url-copy on all
Without globus-url-copy, how would a user job copy files from/to SEs?
The job wrapper is using functionality that you anyway need on your WN.
> of the WN. But I always thought that if I have a home sharing I shouldn't be
> forced to use globus-url-copy for output transfer from WN to CE.
The "lcg" job managers were explicitly written *not* to assume a shared
file system, so they do not even try using that.
> In my case JM always using submit helper to execute a job, but how about
> home sharing? Is it our wrong configuration or submit-helper is always used
> to precede a job in any case?
>
> Actually, your last mail has proofed my suspicion about the reason of our
> (GSI's) problem. The reason is that not all of the WN of our farm have
> globus-url-copy by the path. This is leading to cancel of execution of
> "submit-helper.pl" and getting the error "Cannot read JobWrapper output...".
>
> Certainly we going to fix this problem and get globus-url-copy for all of
> the machines, but my point would be, that I am not really willing (let say I
> don't see the point) of allowing WN to submit output DERECTLY to RB, for
> this I have a shared home and I have a CE, I think.
With the current middleware you *must* allow the WN to connect directly to
the outside world: experiment software requires it, and the WP1 job wrapper
also requires it, as it uses globus-url-copy to copy the input/output sandbox
directly from/to the RB, as well as the "Maradona" file.
> Maybe I misunderstood something. Please, clarify the situation for me...
>
> 1 - Can I use home sharing with LCG 2.3 so as WN will copy the output to CE
> and than CE will send it to RB?
It would require code changes, and you need globus-url-copy to work anyway,
so it would not make a lot of difference.
> 2 - Is "submit-helper.pl" used in any case with and without home sharing?
Yes.
Cheers,
Maarten
> > -----Original Message-----
> > From: Anar Manafov [mailto:[log in to unmask]]
> > Sent: Sunday, February 06, 2005 1:12 PM
> > To: [log in to unmask]
> > Cc: [log in to unmask]; [log in to unmask]
> > Subject: RE: [LCG-ROLLOUT] Cannot read JobWrapper output +
> >
> > Good day, Maarten!
> >
> > Thank you very much for your response. First of all I want to say that
> > your suggestions were always VERY useful for me! And I always appreciate
> > to get your comments and suggestions!!!
> >
> >
> > Indeed, I met this page:
> > http://goc.grid.sinica.edu.tw/gocwiki/Cannot_read_JobWrapper_output%2e%2e%
> > 2e
> >
> > But didn't find anything suitable for my case :(
> >
> > > Of course it is possible that the job did finish, but then it must mean
> > > that:
> > >
> > > 1. the WN could not do a globus-url-copy to the RB, *and*
> > >
> > > 2. Globus could not send back the job wrapper stdout, e.g. because it
> > > was not copied back from the WN to the CE, or because globus-url-copy
> > > does not work from the CE to the RB.
> > >
> > > This combined set of problems still can have a single cause: there can
> > > be a firewall limiting outgoing connections (to ports 20000-25000),
> > > some CRLs can be out of date both on CE and WN, some CA files could be
> > > absent altogether, the time (zone) on CE and WN can be wrong, ...
> > [Anar Manafov]
> > 1 - I also suspect it is output problem (this I can say after debugging
> > the full chain of job submission process), and globus-url-copy could be a
> > reason. I will recheck ones more. It seems something seriously wrong with
> > gridftp, but what exactly, this I can't understand.
> > 2 - This is probably our case. We have home sharing, so hopefully there
> > should be no problem to copy stdout back to CE. :) BUT! globus-url-copy
> > could be a problem!
> >
> > As far as I know our FW is not filtering outgoing connection. But I will
> > certainly recheck that for LSF nodes.
> > Also I will recheck CA files and globus-url-copy functionality on WN and
> > CE.
> > Time synchronization is checked and it is OK.
> >
> > I also saw this page:
> > http://goc.grid.sinica.edu.tw/gocwiki/submit-
> > helper_script_%2e%2e%2e_gave_error%3a_cache_export_dir_%2e%2e%2e
> > It is shouldn't be our case, because of home sharing we have... right?
> >
> >
> > Thank you very much Maarten!
> >
> > Cheers,
> > Anar
>
>
|