Hi,
I have only just had a chance to catch up with this discussion and as I
guess that we are the site that Oxana is referring to as *abusing* the
atlas users I thought that I would say a few things.
Firstly I largely agree with Oxana.
1. With 80+ sites it is impossible for somebody doing production (or any
user for that matter) to take into account individual site
characteristics. Besides the whole point is that they shouldn't have to.
If the info system says that the space is there for that experiment and
that it is permanent (not that anybody checks these as we have
established) then that is what the user should expect. That is part of
what is meant by supporting the atlas experiment. In us asking atlas to
remove their data from our SE we were failing in our supported of atlas.
Also I believe that "permanent" should mean the same for all disk based
SEs i.e. you know that there is a very small but finite risk of
corruption/loss (disks, even raid arrays, fail) but it will not deleted in
an ad-hoc way without plenty of warning.
2. Yes, better data tools are needed. These need to provide the current
functionality in a more robust and rapid way as well adding new
functionality such as the ability to reserve space for a job etc etc. This
is absolutely central to the future of using LCG.
3. There is absolutely no problem with a full SE. They are there to be
used not sit looking pretty in the rack. The problem that we had was that
we had a very standard configuration and a particularly small SE. The
standard installation meant that we had all the experiment areas in the
same partition with no quotas. This included the experiment software
directories as well the data directories. I imagine that most sites are
similar to this. This is clearly a bad configuration as it means that for
example in our case we could not update the CMS software because the
partition was full of atlas data. Clearly what we wanted to do was to
reconfigure this and we wanted the atlas data to be moved off our site so
that we could do this. This is equivalent to asking a user, who you know
has space elsewhere, to move their data off a given disk set while you
work on it. When doing this I find that you always have to give the user a
dealine after which their data is not safe or else they don't move it.
Maybe I didn't take enough time to explain this in the email. The small SE
is because we had to return a couple of 3TB disk servers to the suppliers
that would otherwise be a completly appropriate sized SE for this
installation. We are still waiting for their replacements.
By sometime on Monday we expect to be back up and functioning with the
software areas on different disks to the data areas and with quotas on the
data areas. Later in the week we will look at modify the info-provider to
reflect this use of quotas. So what is in the info providers will be
misleading for a few days. However we have established that nobody
currently uses these anyway.
4. Communication. Everybody agrees that there needs to be better
communication between sysadmins and users. I am happy to subscribe to
whatever lists are needed (we are all on som namy lists these days), but a
better way than this needs to be found. Even if it is just a general
LCG-user list that you could reasonably expect all users to subscribe to.
All the best,
david
|