Print

Print


Hi All

I took some notes from the meeting.

Attendees: Alastair Dewhurst, Tom Byrne, Tim Noble, Mathew Sims, Matt Doidge, Gerard Hand, Roy Williams, Gareth Francis

Lasair is pronounced “Lar-sar”

The current Lasair setup ingests events from the Zwicky Transient Facility (ZTF) which has a 60 inch telescope which is a prototype for LSST.  Data is sent to ECDF via Kafka where it is processed on their cloud and put into 4 streams.  The incoming events are 80kb each and each event stored to CephFS is around 20kb.  The model is that the data is written once but rarely if ever used.

Currently they have 300 million files in 4 years of data taking.  They expect to have 30 times the rate when LSST starts, and this will last for 10 years.  They wish to build a system that can comfortably handle 300Hz a second and ideally be able to peak up to 3000Hz to keep up when there is a spike in alerts.

The current system uses CephFS and they have been storing the data using 4096 hashed directories.  This has run into performance problems with directories with more than 70k files in.  While adding a second level of directories will solve this problem, we are not sure a file system is the best choice of storage for LSST.

Gareth noted that like any science project they are stretched for staff, so want to stick with something they know / existing services if possible.  Tom asked about if it was possible to aggregate files before storing them, and while this is technically possible Gareth felt this would be a worse solution than utilising a more appropriate backend storage.

It was generally agreed that the use case of the problem fits an object store very well (i.e. Large numbers of small files and basically only needing GET and PUT commandS).  Gareth has testing on ECDF Ceph S3 has shown that a single thread can do 50Hz and that with multiple threads they can get to 200Hz. Tom suggested trying specific tools like RClone which will be far better optimised than “boto” for uploading large numbers of files.  It was agreed to have a follow up meeting to run the tests uploading to S3 again to see if they can be improved.  We will try and include Ian Johnson as he has had previous experience with RClone when working with PANOSC.

Alastair



########################################################################

To unsubscribe from the GRIDPP-STORAGE list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=GRIDPP-STORAGE&A=1

This message was issued to members of www.jiscmail.ac.uk/GRIDPP-STORAGE, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/