Hi all,
We have the follownig problem at CGG-LCG2 : When jobs containing
lcg-xx commands arrive on our Worker Nodes, some of lcg-xx commands
complete successfully but very often they hang for a very long time,
until the job proxy expires or the job is deleted by LRMS. But when we
log on the worker node where a lcg command is hanging and rerun it
manually , it works correctly and the job continue running (until the
next lcg command).
We tried to unset GLOBUS_TCP_PORT_RANGE variable from workker nodes, but
this did not help.
Worker Nodes are on a private network and use SE host as router
(se1.egee.fr.cgg.com).
We configured the masquerading with this line in iptables configuration
file :
-A POSTROUTING -s 10.0.0.0/8 -o eth1 -j SNAT --to-source 84.14.104.242
We are having this problem since we installed LCG2.6.0 middleware.
Is there any explanation to that ? Did any one experience a similar
situation ?
Thanks in advance for your help.
Cheers.
Gerald and Ahmed
|