Hi Oxana,
I think that the problem is (as you say) in the configuration here and in
more than one way.
Firstly, the default installation (which most T2s use) makes it possible
for one experiment to completely block the activity at that site by
filling up the disk, even blocking software installation. This is
optimally bad! Yes, you can put them on different partitions but this
isn't the default and is not very pretty to manage as Chris Brew pointed
out, so what is needed here are something like quotas. Even when running
locally we have always had disk quotas for the different experiments to
stop this from happenning and I guess we need them now. Quotas would have
prevented the problems seen at these different sites (including us)
recently.
The second point is perhaps more long term judging from James' reply to
the configuration question. We need both permanent and volatile (scratch)
areas. Permanent should mean "as far as humanly possible" we will not
loose this data. Volatile should mean "as far as humanly possible" we will
not loose this data for a period of (say) 1 week then it is gone.
We also need the space reservation capability that James mentions.
This is not to say that Atlas (CMS, LHCb etc) should not have a way of
clearing up old data. If it doesn't then these volumes will become full of
obsolete data. There needs to be some sort of way determining what is
obsolete from what is not. However this is an experiment software
management issue and is up to the experiment.
All the best,
david
On Mon, 17 Jan 2005, Oxana Smirnova wrote:
> Hi,
>
> Barry MacEvoy :
>> Dear Julio,
>>
>> This problem with ATLAS - an habitual offender
>> with regard to cleaning up after jobs ...
>
> Well, the point with clean-up is related to the "SE configuration"
> thread here. ATLAS jobs do not simply create files, they also register
> them in RLS and in a couple of ATLAS databases. Thus "cleaning up" is
> far from being trivial.
>
> I think ATLAS in future will try to avoid non-Tier1 sites for data
> writing. This might also imply that processing will be mostly
> concentrated at Tier1s or similar sites, to optimize w.r.t. transfer.
> Basically, we're forced to develop our own workload management and
> especially data management systems.
>
> Oxana
>
|