(And, to follow up: be thankful we're not running an EOS storage
system - as it's basically a copy of HDFS but with xrootd protocols,
it stores all its metadata in memory for performance!)
On 8 October 2013 10:04, Sam Skipsey <[log in to unmask]> wrote:
> Hi,
>
> So, I was looking into Rucio recently for other reasons, and did
> notice that the hashing directory algorithm looked like creating a
> *lot* of metadata as directories. I assumed that ATLAS would clean up
> the directory trees on file removal though (which as Alastair notes,
> they apparently don't).
> For DPM, the cns_db tables which handle file metadata have good
> indices, though, so the existence of tons of empty multiply recursed
> directory paths shouldn't actually significantly harm performance
> (although it will of course contribute to database size on SE head
> nodes).
>
> Sam
>
> On 8 October 2013 09:42, Jens Jensen <[log in to unmask]> wrote:
>> Resent on behalf of Alistair Dewhurst (below see).
>>
>>
>> -------- Original Message --------
>> Subject: Storage list email.
>> Date: Mon, 7 Oct 2013 22:15:09 +0000
>> From: Alastair Dewhurst <[log in to unmask]>
>> To: [log in to unmask] <[log in to unmask]>
>>
>>
>>
>> Hi Jens
>>
>> My mail to JISCMAIL got rejected. Could you resend and copy me in please.
>>
>> Alastair
>>
>>
>>
>> *Subject: **ATLAS directories under Rucio*
>>
>> Hi
>>
>> Normally, I would speak to Brian Davies about this, but he is on holiday
>> and then there is CHEP, so I am bringing this to your attention via Jens
>> instead! I apologies if this has been discussed previously.
>>
>> You may be aware that the ATLAS scratch disk space token at RAL (and
>> several other sites) has been full. There is a lot of dark data and I
>> believe Stephane Jezequel may have found some bugs in the ATLAS deletion
>> process. He is currently testing fixes on some French T2s. At RAL, we
>> are currently trying to do a manual dark data cleanup on scratch disk.
>> The dump we got from Castor has 1.98 million files while ATLAS
>> accounting shows less than 450 000 files written!
>>
>> However in addition to the lost files there were also a large number of
>> empty directories in our name server (I don't know how other SEs are
>> configured). In total there were 3.7 million empty directories which
>> can't exactly help performance. However what I found even more
>> concerning was that 2.9 million of these empty directories were created
>> with the rucio naming convention which ATLAS only started to use in
>> around June this year. So thats roughly 300 000 empty directories
>> created a month.
>>
>> I investigated further and found the cause. In rucio, after the base
>> site name, there is a scope followed by two levels of directories with a
>> hex number that corresponds to the first 4 digits of the files checksum.
>> Now in a case like data disk where the scope is something large like
>> data12_8TeV, you can get millions of files stored in it so it makes
>> sense to have a ~65k directory structure to support them. However I
>> have realised that there is a separate scope for every user. Now most
>> users, will probably write a few hundred or maybe if they are very busy
>> a few thousand output files. These will then get written to the pseudo
>> random directory structure. The chances are that every user file will
>> be written to its own directory. Then 2 - 4 weeks later this file will
>> be cleaned up by ATLAS. Certainly in Castors case the directory is left
>> as the ATLAS deletion service does not know if it is empty.
>>
>> Now while the Tier 1 certainly does some extra activity, the number of
>> analysis jobs (which primarily use scratch disk) is actually quite
>> similar between it and the larger Tier 2s. This is because only 5% of
>> RAL capacity is dedicated to analysis while it is 50% for Tier 2s. It
>> would therefore be reasonable to suppose that the empty directory
>> creation at Tier 2s wouldn't be much less than at RAL. I am not a
>> storage expert so I don't know if empty directories would have an affect
>> on your SE. For RAL, we will certainly need to think of monthly clean
>> ups of the directories.
>>
>> Alastair
>>
>>
>>
>> --
>> Scanned by iCritical.
|