Hi All
I have been asked to give the ATLAS talk during the network session at the WLCG workshop in a few weeks time. I only have 10 minutes and have been given very clear instructions that I need to provide a list of ATLAS’ newtork requirements for different types of sites in the medium to long term.
I would request two things:
- If sites have any particular restrictions in their ability to upgrade their network in future please let me know. It is possible to change the wording (or add exceptions in) if it will help sites. What we want to avoid is setting requirements that (for example) all the US sites are happy with but that several UK sites could struggle with. I think the key things are knowing what kind of network links you buy for your storage and WN nodes. What your link to JANET is (and how much of it is for particle physics use). Where are the points where prices jump, or other technical limitations are hit.
- If you have any plots of other evidence of current actual usage this would be extremely useful. I can add them to my talk or at the very least use them to validate any estimations I am making.
Any general comments are also welcome on the talk, although I might not use them as this talk is meant to be from the ATLAS perspective.
The instructions I was given:
Give a maximum 15 minute and preferably 10 minute presentation on your experiment’s needs for the internal and external networking requirements for T2 sites in the following general categories:
o Full—lots of cpu and disk more than adequate for caching needs, possibly able to support WAN access as well as LAN
o CPU rich—i.e. some disk, but probably only enough to act as a local cache
o Disk rich—if we imagine disk is still seen as hard to manage then maybe some T2s will specialise in this and maybe have less CPU than the average or way more disk. This could also be a configuration for an individual site in a distributed T2.
o Disk poor—lower disk/cpu ratio than the CPU rich site
o Diskless—is this feasible for a standalone T2? For a site that is part of a distributed T2
Prepare one slide listing shortcomings you see with today’s network monitoring and setting out the views you would like to have easy access to. This slide should be provided as input to someone who will present a consolidated list of requirements and participate in a constructive debate with someone from the network monitoring community.
Alastair
Dan already sent me some feedback for QMUL:
Hi Alastair
This is what QMUL sees and in planing for (unless otherwise advised)
Internal we have 10Gb/s to every worker node and 10Gb/s to the storage ( 80 Gb/s on the new kit with 1PB of storage per behind it). We actually don't see internal data transfers to worker nodes exceeding 1Gb/s but this is a time averaged (5mins) rate.
Switches are now 5 years old and out of warranty but now days should not cost too much to replace. Unless we get a big dollop of money don't expect any major upgrades.
Externally we have 20Gb/s connection but an internal limit of 15Gb/s (could be removed). We are able to (have done) push this link to the limit with transfers into AND out of the site. We are hoping to upgrade this to 80 or 100 Gb/s by 2020(1) (LHC Run 3). We will need a plan to make sure we have the infrastructure in place (switches/links and SEs) to utilise the bandwidth.
dan
|