On 12.09.2012 21:57, Maarten Litmaath wrote:
> Hi Lukasz,
>
>> The same problem happened with local dpm - and there is no
>> firewall in between
>
> What about the NAT box that your WN seem to be behind?
>
NAT box is plain Linux with Iptables
Hangs happen to either internal wn->dpm_pool transfer as well as to wn
-> world
Our connection is:
lcg-cp 30069 biomed007 7u IPv4 152800688 TCP
10.16.4.32:53426->194.36.10.34:gsiftp (ESTABLISHED)
NAT seems fine:
Conntrack entry:
tcp 6 430575 ESTABLISHED src=10.16.4.32 dst=194.36.10.34
sport=53426 dport=2811 packets=32 bytes=14989 src=194.36.10.34
dst=149.156.9.109 sport=2811 dport=53426 packets=28 bytes=9672 [ASSURED]
use=1
I will try to debug this some more, but it look like some gridftp
protocol deadlock
I have created GGUS ticket for this issue
https://ggus.eu/ws/ticket_info.php?ticket=86057
> On our WMS nodes most of the hanging gridftp-server processes are
> associated with clients behind NAT boxes. But maybe that is simply
> reflecting many sites having their WN set up like that...
>
>> timeouts are set what means that lcg-cp should respect them
>
> Yes, there would be a bug in the old GFAL code of the gLite 3.2 WN.
> That is one good reason to move to the EMI WN as soon as feasible,
> which for WLCG means it should work for ATLAS and/or CMS.
> We will try to get that sorted still this month, with the help of
> various sites configuring test queues with the EMI-2 SL5 WN.
We can dedicate some nodes and queue for testing UMD2 with WLCG,
are you aware of any site successfuly running atlas,alice,lhcb with UMD2?
--
LKF
|