Hi All

 

I have been talking to Roy Williams from ROE regarding storage for Lasair, I have setup a meeting with them for next Friday 12th May in the 11am Technical meeting slot.  With it being CHEP that week and slightly off-piste for GridPP I didn’t want to advertise it as a formal Technical Meeting however if anyone here is interested in offering advice you are more than welcome to join.

 

The problem summary:

“Hi Alastair
I have been talking to George Beckett about object storage, and he said that if I really want to know, I should talk to you. I am confused about CephFS, Rados, S3 buckets, etc.

I run a system called Lasair (
https://lasair-ztf.lsst.ac.uk) that has been collecting astronomical data for the last 4 years, with 1000s of image files arriving every day, and now we have about 300 million such images, each 20 kbyte. They need to be written quickly so as not to slow down the ingestion pipeline, but reading them is relatively rare -- eg. when somebody clicks on a webpage or runs the API.

Currently we have a CephFS on our openstack cloud. Filenames are hashed into 4096 directories, each of which now has 300 million/4096 = 73,000 files. The ingestion is getting very slow. We are transitioning to a new "double-hash" system, where each directory has its own 4096 sub-directories, meaning each will have 300 million/(4096^2) = 18 files. But I have a feeling this is just a patch on a gaping wound ....

 

May I talk with you about alternatives to CephFS?

 

They have setup the zoom link:

https://us02web.zoom.us/j/84730675785?pwd=ejY0QUZvcytad3pZZ2xpZE03TFEzdz09

 

Alastair



To unsubscribe from the GRIDPP-STORAGE list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=GRIDPP-STORAGE&A=1