Hi Tim,
Let me see if I can expand it a bit without muddling it too much...
Suppose I am the admin and I have an SE, and I want to "take it off
the Grid". You are the VO, and I talk to you.
So I list the files inside the SE, and I get a list of files that
"belong" to CMS. They may have been written by PhEDEx or by
individual physicists, or by jobs running as you, or whatever.
The point is that I have no way of knowing who "owns" the file
except it is someone or something from CMS. Site admins cannot
know which catalogue (if any) owns the data. I can trace which DN
created the data but that could be you, srmcp'ing it in (i.e. not
catalogued), or a job, or a replication service (presumably catalogued),
or almost anything. Worse yet, in gLite it could have been created
by a service with a service certificate, and then I'm really stuffed.
So the best I can do as an admin is to ask you "here is a list
of files, what do you want to do with them". Some are in your
catalogues, fine: you copy them to somewhere safe if you don't
already have other copies. Some would have been squirrelled
away by individuals, but it is still up to *you* as a VO to
figure out what to do with them (I am not sure I am even allowed
to tell you who created it, for data protection reasons???)
So John is right, the SE admin needs you, the VO, to know where your
catalogues are. Stephen is right, that you should be able to decide
what to do with the remaining files that are not in your catalogues
- all I can tell you is that I am going to delete them in 60 days (or
whatever is reasonable).
You may have methods to warn your users, or just copy the lot somewhere
else to be safe, or decide that you can lose them, etc. Your call.
Or you could tell me "no please don't delete them yet, I haven't figured
out who created them or why they were created."
Even with datasets (RefDB??), it still boils down to owning
individual files, because that's what the SE sees. However, files
in datasets are very likely to be fully registered card-carrying member
files of your VO unless I misunderstood something, so those at least
should be safe.
Er, I hope this made it clearer....
Cheers,
--jens
-----Original Message-----
From: Testbed Support for GridPP member institutes
[mailto:[log in to unmask]]On Behalf Of Tim Barrass
Sent: 07 October 2005 18:18
To: [log in to unmask]
Subject: Re: Removing VOs files if not in catalogue (was Re: Storage
Types (was Re: [TB-SUPPORT] Reminder of the next UKI meeting)
Hi John,
Ok, that's fine then; we definitely keep tabs on where our files are.
It sounded like the site admins were expecting to know which catalogue
to go to to find out which files were 'owned'. If that responsibility
remains in our hands that's fine.
Thanks,
Tim
On 7 Oct 2005, at 18:04, Gordon, JC (John) wrote:
> Tim, the point is that YOU as the VO should know where your files are
> catalogued. Henry was saying we needed to support every physicist who
> squirreled away his own list of files. The site doesn't know where
> files
> are catalogued but we hope you do.
>
> John
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes
>> [mailto:[log in to unmask]] On Behalf Of Tim Barrass
>> Sent: 07 October 2005 17:42
>> To: [log in to unmask]
>> Subject: Removing VOs files if not in catalogue (was Re:
>> Storage Types (was Re: [TB-SUPPORT] Reminder of the next UKI meeting)
>>
>>> It is of course true that anyone can upload files to an SE, but it
>>> will always be tagged as belonging to a VO. Thus, if an SE
>> is taken
>>> off the Grid, the admins should (at least in principle)
>> contact the VO
>>> and say "Here is a list of your files, copy them off within
>> 60 days or
>>> lose them". If users have uploaded data and it's not in the VO's
>>> catalogues, well, tough. The alternative is to trawl
>> through the DNs
>>> of the owners, and somehow magically figure out the email
>> address, and
>>> then contact them, and then wait for them to get back to you, and
>>> check that you have found the owners of all the files and and and.
>>
>> Hi Jens,
>>
>> This comment worries me slightly-- could you clarify what you
>> mean by "in the VO's catalogues"? We register all our files
>> in a number of places. Specifically for CMS (as you know, I
>> think?) we register files for grid access via the PubDB
>> infrastructure. Does this count?
>>
>> In the future CMS will also be using a dataset location
>> service rather than file location: the specific location of
>> files will only be known as the job reaches the site, mapped
>> using a relatively independent local catalogue. Does this fit
>> into your scheme?
>>
>> Thanks,
>> Tim
>>
|