Our core network switch (The Force10 C300) on the Tier-1 has the ability to do QoS (although maybe not while running in the layer 2 mode it is currently used). Given that our data flow to the Tier-2s is via a dedicated bypass we are in principle able to identify the gridftp traffic for a subnet and apply a rate cap.
However - unfortunatly, as the C300 is currently used as a switch we don't have the expertise to twiddle it - its critical to our operation and comes with a 1200 page manual so we won't be fiddling and hoping for the best. However I see two possible ways forward.
1) If QMW really urgently need the Tier-1 to turn down the rate then we would have to look at getting in some consultancy to
help us get started. We are easily talking about £1K probably more. Its not so much the money issue, but of course if at the end
all it leaves us with is a couple of config changes then thats not really best use of resources. It is however a possible
avenue to follow if considered vital by the dteam.
2) If (As) this seems like a longer term requirement (and I can see various reasons why it probably is needed by us) then we need to
look at paying for some training on the Force10 or giving someone time to go and test/play with our test unit. Given how critical
this device is to us and how stretched the Fabric team are training may well be money well spent.
3) I've also asked that we increase the priority to obtain flow level traffic data off the C300 so we have better diagnostics.
I understand the reasoning that the network into QMW needs to be bigger and doubtless tagging griftp traffic would be very helpful, but maybe we should be looking more and managing bandwidth at the Tier-1. I'd always assumed this could be done atthe FTS and now appreciate that this isn't so - this leaves the Tier-1 also exposed to unmanaged dataflows from the heavy duty Tier-2 sites.
Maybe we should discuss requirements at a dteam meeting?
Regards
Andrew
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]]On Behalf Of Graeme Stewart
> Sent: 16 November 2010 12:49
> To: [log in to unmask]
> Subject: Re: GridFTP ToS (and Traffic shaping/policing)
>
>
> On Tue, Nov 16, 2010 at 00:10, John Gordon
> <[log in to unmask]> wrote:
> > Can we not lobby ATLAS again to accept that the traffic to
> a site should not just be based on the amount of disk they
> have but some function of the network bandwidth and disk. One
> obviously can't just change from disk to bandwidth or a site
> with a fat pipe and not much disk would suffer a different fate.
>
> Hi John
>
> It would be really stupid to put disk and CPU at a site which was
> connected down a soggy piece of string. QMUL will be ATLAS's second
> largest T2 site so it needs better networking. I believe this is being
> progressed.
>
> At the moment Chris's solution seems technically the best one. FTS
> just has no hooks to set a bandwidth cap on a channel and if we pared
> the number of slots to the bone then we'd be killed by small file
> overheads. Having gridftp set the TOS flag is sensible.
>
> Please note that it is not because of PD2P that QMUL is getting a lot
> of data. We are in the midst of a reprocessing campaign and most of
> the data is moved by post-repocessing subscriptions (darker brown vs.
> light brown in the plot).
>
> Cheers
>
> Graeme
>
> >
> >> -----Original Message-----
> >> From: Testbed Support for GridPP member institutes [mailto:TB-
> >> [log in to unmask]] On Behalf Of Christopher J.Walker
> >> Sent: 15 November 2010 19:37
> >> To: [log in to unmask]
> >> Subject: GridFTP ToS (and Traffic shaping/policing)
> >>
> >> After the spring reprocessing, ATLAS maxed out QMUL's link
> to Janet for
> >> a couple of weeks. This seemed to cause job unreliability
> - presumably
> >> they were unable to phone home in the face of large
> amounts of packet loss.
> >>
> >> With the recent reprocessing and/or Atlas's move to pd2p,
> they are again
> >> transferring lots of data to QMUL, and again maxing out our link.
> >>
> >> I've ended up implementing traffic policing on our SE.
> What we do is
> >> drop traffic to gridFTP's data transfer port range when we
> fill 75% of
> >> the link. This causes TCP/IP to backoff - so leaving some
> space on the
> >> link.
> >>
> >> We are currently only dropping incoming packets in the
> globus portrange.
> >> This presumably includes ack packets for outgoing
> traffic which must
> >> be a bad thing. We'll also need to drop outgoing traffic
> too I think[1].
> >>
> >> One of the suggestions in the Linux Advanced Routing &
> Traffic Control
> >> howto (http://lartc.org/) is to drop based on the TOS IP
> headers. This
> >> doesn't seem to be set in gridftp packets - (though it does in scp
> >> packets).
> >>
> >> TOS would seem an obvious thing to set - it would enable
> us and/or Janet
> >> to drop the bulk data in preference to the interactive
> data. It would
> >> also mean I can tell our network team to drop bulk packets
> rather than a
> >> complicated host/port range. Does anyone know why globus
> don't set it?
> >>
> >> Chris
> >>
> >>
> >> [1] Indeed globus provide a script to do this at
> >> http://www.globus.org/toolkit/data/gridftp/bwlimit.html
> >
>
|