Print

Print


Sounds like you should consider the differences etwwen that pool node
and the others that are working well.
 Config issue on that particular pool node?
Firewall/resolv.conf etc ????
Brian

-----Original Message-----
From: GRIDPP2: Deployment and support of SRM and local storage
management [mailto:[log in to unmask]] On Behalf Of Matt
Doidge
Sent: 26 May 2010 12:55
To: [log in to unmask]
Subject: GridFTP timeouts for atlas SAM tests.

Heya guys,

We're failing atlas SRM SAM tests far too regularly, and looking into it
deeper it seems to be the same few pool nodes causing the failures
during the lcg-cr portion of the test, specifically by hitting the 600
second timeout. In fact one of our pools doesn't seem to have passed a
test in a while. Poking the pool doesn't yield much of interest, the
hardware seems fine and the various dpm pool daemons appear to be
running alright. But trawling the gridftp logs yields pretty much the
same result as we see from the SAM test pages, after about 10 minutes
the server requests an abort. The test files are only 41kB in size, you
wouldn't have a problem downloading them in a timely manner with a
dial-up connection, so this shouldn't be a data rate problem. It seems
to me like the transfers never really get started after the initial
connection negotiation. Has anyone seen this kind of behaviour before
with their dpms?

Vital statistics of the pool node in question are: Running on SL5.4, DPM
version 1.7.2-5, SE Linux is enabled.

Thanks in advance,
Matt