Hi John,
QMUL has two 2x10Gb/s links (+ dedicated links in the Slough JISC data centre). Our cluster at QMUL is officially provided with the full capacity of our 2x10Gb/s backup JANET link. All other QMUL traffic goes through the other link.
The cluster is in a "zone" that is outside of QMUL's Cisco ASA firewalls which means we don't have any issues there (performance, opening ports, etc, etc). This is something we have had for a long time before DMZs were even a thing.
Several years back we paid for 10Gb/s optics and line cards to link our server room to the QMUL core network at 20Gb/s.
When we had the "network" money ~ 5 years ago we used some of it to upgrade the University link to 10Gb/s (including buying server based routers). These have now been replaced with Cisco Nexus 7000-series based devices as the University wanted an enterprise
solution. Even buying line cards for these devices cost exorbitant amounts of money.
Until recently we have had very good network performance from our WAN link.
We have started to think how we are going to get 100Gb/s connection in ~2020 (if this is what ATLAS expects). I am not sure how this will be funded.
A few details concerning the networking equipment between our cluster and the JANET edge router are below:
* The 2x10Gb/s backup link is currently down due to refurbishment after a fire in the room where the link terminated; as such, we are throttled to 10Gb and sharing QMUL's 2x10Gb/s main JANET link with the rest of the University traffic.
* There is a Forcepoint IDS/IPS inline between each of the JANET routers and the University's own border router devices - these devices are configured in bridging mode.
* Traffic which is destined for/transmitted by our cluster network is *not* inspected by the Forcepoint device but it still has to pass through said device and we have identified small amounts of packet loss and a fixed increase in network latency which
has impacted our transfer speeds to both local sites in the UK and sites further away such as BNL.
* We are also subject to a HP ProCurve switch inline between the University border router and our own network infrastructure; there is a maximum throughput limit of 16Gb/s over the line card backplane on this device and our University network staff are
aware but have yet to do anything about this - we would prefer this switch out of the traffic path as it serves no useful purpose (plus, the same make and model of switch was identified as being the cause of the fire which took out the University's backup
link!)
The Forcepoint devices suffer significant CPU load issues even when the inspected traffic load is <10Gb/s and runs at approx. 50% when merely forwarding (and not inspecting) the 16Gb/s which we are capable of.
Given that our backup link and the border router connected to it will be replaced within the next month or so, I am hopeful that I can meet with the local Network bod who put the Forcepoint device inline and try to reconfigure in such a way that our cluster
traffic does not need to pass through the Forcepoint device but the rest of the University traffic continues to traverse it.
Regards,
Terry
--
Terry Froy
Cluster Systems Manager, Particle Physics
Queen Mary University of London
Tel: +44 (0)207 882 6560
E-mail: [log in to unmask]