Good day, Maarten!
Thank you very much for your response. First of all I want to say that your
suggestions were always VERY useful for me! And I always appreciate to get
your comments and suggestions!!!
Indeed, I met this page:
http://goc.grid.sinica.edu.tw/gocwiki/Cannot_read_JobWrapper_output%2e%2e%2e
But didn't find anything suitable for my case :(
> Of course it is possible that the job did finish, but then it must mean
> that:
>
> 1. the WN could not do a globus-url-copy to the RB, *and*
>
> 2. Globus could not send back the job wrapper stdout, e.g. because it
> was not copied back from the WN to the CE, or because globus-url-copy
> does not work from the CE to the RB.
>
> This combined set of problems still can have a single cause: there can
> be a firewall limiting outgoing connections (to ports 20000-25000),
> some CRLs can be out of date both on CE and WN, some CA files could be
> absent altogether, the time (zone) on CE and WN can be wrong, ...
[Anar Manafov]
1 - I also suspect it is output problem (this I can say after debugging the
full chain of job submission process), and globus-url-copy could be a
reason. I will recheck ones more. It seems something seriously wrong with
gridftp, but what exactly, this I can't understand.
2 - This is probably our case. We have home sharing, so hopefully there
should be no problem to copy stdout back to CE. :) BUT! globus-url-copy
could be a problem!
As far as I know our FW is not filtering outgoing connection. But I will
certainly recheck that for LSF nodes.
Also I will recheck CA files and globus-url-copy functionality on WN and CE.
Time synchronization is checked and it is OK.
I also saw this page:
http://goc.grid.sinica.edu.tw/gocwiki/submit-helper_script_%2e%2e%2e_gave_er
ror%3a_cache_export_dir_%2e%2e%2e
It is shouldn't be our case, because of home sharing we have... right?
Thank you very much Maarten!
Cheers,
Anar
|