[log in to unmask] wrote:
> Our core network switch (The Force10 C300) on the Tier-1 has the ability to do QoS (although maybe not while running in the layer 2 mode it is currently used). Given that our data flow to the Tier-2s is via a dedicated bypass we are in principle able to identify the gridftp traffic for a subnet and apply a rate cap.
>
> However - unfortunatly, as the C300 is currently used as a switch we don't have the expertise to twiddle it - its critical to our operation and comes with a 1200 page manual so we won't be fiddling and hoping for the best. However I see two possible ways forward.
>
> 1) If QMW really urgently need the Tier-1 to turn down the rate then we would have to look at getting in some consultancy to
> help us get started. We are easily talking about £1K probably more. Its not so much the money issue, but of course if at the end
> all it leaves us with is a couple of config changes then thats not really best use of resources. It is however a possible
> avenue to follow if considered vital by the dteam.
I believe I've solved the immediate problem, so there's no immediate
need to do this.
>
> 2) If (As) this seems like a longer term requirement (and I can see various reasons why it probably is needed by us) then we need to
> look at paying for some training on the Force10 or giving someone time to go and test/play with our test unit. Given how critical
> this device is to us and how stretched the Fabric team are training may well be money well spent.
I'd have thought so.
I do have a slight hesitation that these limits will become a difficult
to find bottleneck when a site does actually upgrade its bandwidth.
>
> 3) I've also asked that we increase the priority to obtain flow level traffic data off the C300 so we have better diagnostics.
>
> I understand the reasoning that the network into QMW needs to be bigger
> and doubtless tagging griftp traffic would be very helpful, but maybe
> we should be looking more and managing bandwidth at the Tier-1. I'd
> always assumed this could be done atthe FTS and now appreciate that
> this isn't so
Note that it would be perfectly possible to use bandwidth shaping on
each gridftp machine at RAL - in order to limit the amount of bandwidth
per channel. I'm not sure that's a good idea though - the real concern
is the total bandwidth into a site.
Also, bandwidth shaping at RAL won't be a complete solution if inter
Tier-2 traffic is significant.
> - this leaves the Tier-1 also exposed to unmanaged
> dataflows from the heavy duty Tier-2 sites.
>
> Maybe we should discuss requirements at a dteam meeting?
>
Chris
> Regards
> Andrew
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes
>> [mailto:[log in to unmask]]On Behalf Of Graeme Stewart
>> Sent: 16 November 2010 12:49
>> To: [log in to unmask]
>> Subject: Re: GridFTP ToS (and Traffic shaping/policing)
>>
>>
>> On Tue, Nov 16, 2010 at 00:10, John Gordon
>> <[log in to unmask]> wrote:
>>> Can we not lobby ATLAS again to accept that the traffic to
>> a site should not just be based on the amount of disk they
>> have but some function of the network bandwidth and disk. One
>> obviously can't just change from disk to bandwidth or a site
>> with a fat pipe and not much disk would suffer a different fate.
>>
>> Hi John
>>
>> It would be really stupid to put disk and CPU at a site which was
>> connected down a soggy piece of string. QMUL will be ATLAS's second
>> largest T2 site so it needs better networking. I believe this is being
>> progressed.
>>
>> At the moment Chris's solution seems technically the best one. FTS
>> just has no hooks to set a bandwidth cap on a channel and if we pared
>> the number of slots to the bone then we'd be killed by small file
>> overheads. Having gridftp set the TOS flag is sensible.
>>
>> Please note that it is not because of PD2P that QMUL is getting a lot
>> of data. We are in the midst of a reprocessing campaign and most of
>> the data is moved by post-repocessing subscriptions (darker brown vs.
>> light brown in the plot).
>>
>> Cheers
>>
>> Graeme
>>
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>>> [log in to unmask]] On Behalf Of Christopher J.Walker
>>>> Sent: 15 November 2010 19:37
>>>> To: [log in to unmask]
>>>> Subject: GridFTP ToS (and Traffic shaping/policing)
>>>>
>>>> After the spring reprocessing, ATLAS maxed out QMUL's link
>> to Janet for
>>>> a couple of weeks. This seemed to cause job unreliability
>> - presumably
>>>> they were unable to phone home in the face of large
>> amounts of packet loss.
>>>> With the recent reprocessing and/or Atlas's move to pd2p,
>> they are again
>>>> transferring lots of data to QMUL, and again maxing out our link.
>>>>
>>>> I've ended up implementing traffic policing on our SE.
>> What we do is
>>>> drop traffic to gridFTP's data transfer port range when we
>> fill 75% of
>>>> the link. This causes TCP/IP to backoff - so leaving some
>> space on the
>>>> link.
>>>>
>>>> We are currently only dropping incoming packets in the
>> globus portrange.
>>>> This presumably includes ack packets for outgoing
>> traffic which must
>>>> be a bad thing. We'll also need to drop outgoing traffic
>> too I think[1].
>>>> One of the suggestions in the Linux Advanced Routing &
>> Traffic Control
>>>> howto (http://lartc.org/) is to drop based on the TOS IP
>> headers. This
>>>> doesn't seem to be set in gridftp packets - (though it does in scp
>>>> packets).
>>>>
>>>> TOS would seem an obvious thing to set - it would enable
>> us and/or Janet
>>>> to drop the bulk data in preference to the interactive
>> data. It would
>>>> also mean I can tell our network team to drop bulk packets
>> rather than a
>>>> complicated host/port range. Does anyone know why globus
>> don't set it?
>>>> Chris
>>>>
>>>>
>>>> [1] Indeed globus provide a script to do this at
>>>> http://www.globus.org/toolkit/data/gridftp/bwlimit.html
|