After the spring reprocessing, ATLAS maxed out QMUL's link to Janet for
a couple of weeks. This seemed to cause job unreliability - presumably
they were unable to phone home in the face of large amounts of packet loss.
With the recent reprocessing and/or Atlas's move to pd2p, they are again
transferring lots of data to QMUL, and again maxing out our link.
I've ended up implementing traffic policing on our SE. What we do is
drop traffic to gridFTP's data transfer port range when we fill 75% of
the link. This causes TCP/IP to backoff - so leaving some space on the
link.
We are currently only dropping incoming packets in the globus portrange.
This presumably includes ack packets for outgoing traffic which must
be a bad thing. We'll also need to drop outgoing traffic too I think[1].
One of the suggestions in the Linux Advanced Routing & Traffic Control
howto (http://lartc.org/) is to drop based on the TOS IP headers. This
doesn't seem to be set in gridftp packets - (though it does in scp
packets).
TOS would seem an obvious thing to set - it would enable us and/or Janet
to drop the bulk data in preference to the interactive data. It would
also mean I can tell our network team to drop bulk packets rather than a
complicated host/port range. Does anyone know why globus don't set it?
Chris
[1] Indeed globus provide a script to do this at
http://www.globus.org/toolkit/data/gridftp/bwlimit.html
|