Hi All
With my changing role, I need to hand over (or just stop working on) various tasks. Below is my personal opinion of how we should evolve the "ATLAS sites" in the UK.
Alastair
- CPU Only
For the smallest sites (in terms of either manpower or allocated CPUs), they should either use VAC, or if they already have a batch farm, they could setup a CE (current recommendation would be ARC). The WN would directly access another sites storage for both reads and writes. Depending on network connectivity, this site could be limited to work which is light on I/O (e.g. MC production work).
Status: Already working.
Examples: UCL, Imperial
- ARC Cache Sites
For sites that have a batch farm they could utilise the features of ARC to run a much wider range of jobs (hopefully all). An ARC CE would be required and access to either a shared file system to cache the data in advance. This shared file system could either be something like Lustre although ARC can also use multiple NFS mounts.
From talking to David Cameron and Andrej Filipcic, they believe you need one NFS server for every 500 - 1000 job slots you have. If you have several old disk servers they could each be mounted separately and ARC will automatically distribute the data across these. The storage is just a cache, so if you lose the disk, then only jobs currently running would be impacted.
Status: Some effort is required to get this working at Durham properly and this will require sites to be fully integrated with ARC Control Tower (aCT) to get the benefits of pre-fetching data.
Examples: Durham, Sussex? In the longer term maybe others?
- XCache Sites
I define an XCache site as a site that is using XCache to transparently improve the performance of data intensive jobs. XCache could also be used as a Volatile Storage element which is described in the next section.
The advantage of XCache over ARC Cache is that it can do block level caching. This can potentially offer performance improvements for things like analysis jobs. It also doesn’t require any effort from ADC to configure (i.e. not part of aCT).
Small XCache could be deployed per WN like at RAL or a single larger instance could be setup. The XCache would still need to point at another sites storage.
Status: We have got XCache working at RAL and there is lots of development work ongoing. I am not exactly sure what the status is at places like ECDF. Its definitely worth being engaged with this activity (and it would be good if we could try and host the XRootD workshop in the UK) but I don’t feel there is any problem that this solves significantly better than other solutions.
- Volatile Storage elements
This is available in Rucio now (for testing at least). Its like a normal RSE except it gives the site control over when it removes files. The removed files need to be un-registered in Rucio by the site. I don’t think this will be useful for sites that are shrinking their storage. I feel this could be useful for things like commercial cloud storage, which would allow a site to keep control of its costs more easily.
Status: Development / testing by the ATLAS DDM team.
|