JISCMail - GRIDPP-STORAGE Archives

Hi,

So, I was looking into Rucio recently for other reasons, and did
notice that the hashing directory algorithm looked like creating a
*lot* of metadata as directories. I assumed that ATLAS would clean up
the directory trees on file removal though (which as Alastair notes,
they apparently don't).
For DPM, the cns_db tables which handle file metadata have good
indices, though, so the existence of tons of empty multiply recursed
directory paths shouldn't actually significantly harm performance
(although it will of course contribute to database size on SE head
nodes).

Sam

On 8 October 2013 09:42, Jens Jensen <[log in to unmask]> wrote:
> Resent on behalf of Alistair Dewhurst (below see).
>
>
> -------- Original Message --------
> Subject:        Storage list email.
> Date:   Mon, 7 Oct 2013 22:15:09 +0000
> From:   Alastair Dewhurst <[log in to unmask]>
> To:     [log in to unmask] <[log in to unmask]>
>
>
>
> Hi Jens
>
> My mail to JISCMAIL got rejected.  Could you resend and copy me in please.
>
> Alastair
>
>
>
> *Subject: **ATLAS directories under Rucio*
>
> Hi
>
> Normally, I would speak to Brian Davies about this, but he is on holiday
> and then there is CHEP, so I am bringing this to your attention via Jens
> instead!  I apologies if this has been discussed previously.
>
> You may be aware that the ATLAS scratch disk space token at RAL (and
> several other sites) has been full.  There is a lot of dark data and I
> believe Stephane Jezequel may have found some bugs in the ATLAS deletion
> process.  He is currently testing fixes on some French T2s.  At RAL, we
> are currently trying to do a manual dark data cleanup on scratch disk.
>  The dump we got from Castor has 1.98 million files while ATLAS
> accounting shows less than 450 000 files written!
>
> However in addition to the lost files there were also a large number of
> empty directories in our name server (I don't know how other SEs are
> configured).  In total there were 3.7 million empty directories which
> can't exactly help performance.  However what I found even more
> concerning was that 2.9 million of these empty directories were created
> with the rucio naming convention which ATLAS only started to use in
> around June this year.  So thats roughly 300 000 empty directories
> created a month.
>
> I investigated further and found the cause.  In rucio, after the base
> site name, there is a scope followed by two levels of directories with a
> hex number that corresponds to the first 4 digits of the files checksum.
>  Now in a case like data disk where the scope is something large like
> data12_8TeV, you can get millions of files stored in it so it makes
> sense to have a ~65k directory structure to support them.  However I
> have realised that there is a separate scope for every user.  Now most
> users, will probably write a few hundred or maybe if they are very busy
> a few thousand output files.  These will then get written to the pseudo
> random directory structure.  The chances are that every user file will
> be written to its own directory.  Then 2 - 4 weeks later this file will
> be cleaned up by ATLAS.  Certainly in Castors case the directory is left
> as the ATLAS deletion service does not know if it is empty.
>
> Now while the Tier 1 certainly does some extra activity, the number of
> analysis jobs (which primarily use scratch disk) is actually quite
> similar between it and the larger Tier 2s.  This is because only 5% of
> RAL capacity is dedicated to analysis while it is 50% for Tier 2s.  It
> would therefore be reasonable to suppose that the empty directory
> creation at Tier 2s wouldn't be much less than at RAL.  I am not a
> storage expert so I don't know if empty directories would have an affect
> on your SE.  For RAL, we will certainly need to think of monthly clean
> ups of the directories.
>
> Alastair
>
>
>
> --
> Scanned by iCritical.