Print

Print


Hi John,

QMUL has two 2x10Gb/s links (+ dedicated links in the Slough JISC data centre). Our cluster at QMUL is officially provided with the full capacity of our 2x10Gb/s backup JANET link. All other QMUL traffic goes through the other link.

The cluster is in a "zone" that is outside of QMUL's Cisco ASA firewalls which means we don't have any issues there (performance, opening ports, etc, etc).  This is something we have had for a long time before DMZs were even a thing.

Several years back we paid for 10Gb/s optics and line cards to link our server room to the QMUL core network at 20Gb/s.

When we had the "network" money ~ 5 years ago we used some of it to upgrade the University link to 10Gb/s (including buying server based routers). These have now been replaced with Cisco Nexus 7000-series based devices as the University wanted an enterprise solution. Even buying line cards for these devices cost exorbitant amounts of money.

Until recently we have had very good network performance from our WAN link.

We have started to think how we are going to get 100Gb/s connection in ~2020 (if this is what ATLAS expects). I am not sure how this will be funded.

A few details concerning the networking equipment between our cluster and the JANET edge router are below:

  *   The 2x10Gb/s backup link is currently down due to refurbishment after a fire in the room where the link terminated; as such, we are throttled to 10Gb and sharing QMUL's 2x10Gb/s main JANET link with the rest of the University traffic.
  *   There is a Forcepoint IDS/IPS inline between each of the JANET routers and the University's own border router devices - these devices are configured in bridging mode.
  *   Traffic which is destined for/transmitted by our cluster network is *not* inspected by the Forcepoint device but it still has to pass through said device and we have identified small amounts of packet loss and a fixed increase in network latency which has impacted our transfer speeds to both local sites in the UK and sites further away such as BNL.
  *   We are also subject to a HP ProCurve switch inline between the University border router and our own network infrastructure; there is a maximum throughput limit of 16Gb/s over the line card backplane on this device and our University network staff are aware but have yet to do anything about this - we would prefer this switch out of the traffic path as it serves no useful purpose (plus, the same make and model of switch was identified as being the cause of the fire which took out the University's backup link!)

The Forcepoint devices suffer significant CPU load issues even when the inspected traffic load is <10Gb/s and runs at approx. 50% when merely forwarding (and not inspecting) the 16Gb/s which we are capable of.

Given that our backup link and the border router connected to it will be replaced within the next month or so, I am hopeful that I can meet with the local Network bod who put the Forcepoint device inline and try to reconfigure in such a way that our cluster traffic does not need to pass through the Forcepoint device but the rest of the University traffic continues to traverse it.

Regards,
Terry
--
Terry Froy
Cluster Systems Manager, Particle Physics
Queen Mary University of London
Tel: +44 (0)207 882 6560
E-mail: [log in to unmask]


________________________________
From: Testbed Support for GridPP member institutes <[log in to unmask]> on behalf of John Bland <[log in to unmask]>
Sent: 26 January 2018 08:20:35
To: [log in to unmask]
Subject: WAN Woes

Hi,

We're getting some push back from our central networking team about our
WAN connectivity.

Our current connection uses the standard shared campus WAN, passing
through the university firewall, then out to JISC through a redundant
pair of 10G links.

Although we have our 'grid' IP range set to be not filtered by the
firewall all packets still pass through it and still get hit with some
filtering (most recent bit of fun was SSL connections with X509
certificates being dropped because they were wrongly marked as
'insecure', essentially killing all Grid traffic).

Our traffic also causes campus-wide issues, mostly due to overloading
the firewall rather than the links themselves, so we are throttled to
~5G. While we have IPv6 addresses our traffic is being heavily throttled
(~0.3G) by university routers in the path that have very poor IPv6
performance.

The plan was to reuse some university routers to upgrade the physical
connection and to provide us a direct 10G link to the JISC WAN, with no
University firewall and (supposedly) much better IPv6 throughput.

Despite this initial progress the University is now pushing us (again)
to pay for our own direct 10G link to JISC, and pay for and install a
hardware firewall on this connection (yeah). Apparently another
department has done this (why, or how, we don't know).

What would be interesting to know before loading up my shotgun and
replying to them is whether other Grid sites do this, or have been asked
to do this. Does any other Grid site pay for a dedicated WAN uplink to
JISC just for GridPP or their department? Do you put a hardware firewall
on this path as well?

Cheers,

John

--
John Bland                       [log in to unmask]
Research Fellow                  office: 220
High Energy Physics Division     tel (int): 42911
Oliver Lodge Laboratory          tel (ext): +44 (0)151 794 2911
University of Liverpool          http://www.liv.ac.uk/physics/hep/
"I canna change the laws of physics, Captain!"