Hi
Alessandra: These would be T1/2 -> T2 data transfers requested by
users. It could in future be extended to T3 sites. Does that answer
your query?
Sam: There are two reasons, firstly users need to copy their data
once it is produced (on scratchdisk) somewhere safer if they want to
keep it. (scratchdisk deletes data automatically after 30 days)
Putting it all together at their home institute seems reasonable.
Secondly at some point they will need to run on their data locally to
do fine tuning of their cuts/analysis and produce plots. You are
right that if they have produced a large amount of data that they
should use tools such as prun which will allow them ability to run
root/other scripts on their data but at some point it will be much
faster to do this locally on 1 machine. Where we draw the line is a
matter of debate but if a user can fit all their data on their
laptop, some of them will try it. I think Roger Jones said in a
talk that: Users are expect to download upto 10GB, if they download
~100GB a day their rate will be throttled and if they download ~1TB a
day they will get throttled! So the case we are looking at here is
the heavy user (possibly because of a group production role).
I also think that for most people the total size of their files is
quite small (~a few GB) however with real data they are producing
alot of files even if they are only of the order of a few kb each.
Its not always possible for the user to merge their files on the grid
before downloading them and In this case a dq2-get command is still
inefficient compared to a datri request. If you check http://
www.hep.lancs.ac.uk/~love/ukdata/token/ATLASLOCALGROUPDISK/
its clear that there is currently alot of unused space.
When I said:
"The user would need access to the local mass storage"
I was actually meaning to add, via the local transport protocol.
While I assumed it was possible for this to be done, I thought it
best to ask incase there was something I had overlooked.
Stephen: With regards to the space token, this is something I am
discussing with ADC. In the US, where they have many universities
that aren't tier 2 sites, they set up a single space token similar to
our localgroupdisk which they can then copy whatever data they want
too. The site rather than the US cloud is responsible for deleting
stuff. Because alot of our universities are Tier 2s we could choose
to use the existing scratchdisk or localgroupdisk if we wanted or set
up a new space token. There are advantages and disadvantages in each
but it mostly comes down to who is in charge of deleting stuff on the
space token.
Alastair
On 5 Jul 2010, at 13:30, Sam Skipsey wrote:
> On 5 July 2010 12:26, Alastair Dewhurst <[log in to unmask]>
> wrote:
>> Hi
>>
>> The following is an idea in the early stage of development,
>> thoughts would
>> be welcome.
>>
>> Currently if ATLAS users wish to get data that has been produced
>> on the grid
>> they use dq2-get (this basically just looks up where ATLAS thinks
>> the file
>> is and does a lcg-cp command). This is fine if the user has a
>> small amount
>> of data and a small number of files. However it is becoming
>> apparent that
>> users are producing quite large sets of output files (~1000 files and
>> ~100GB). This is only likely to get worse as we get more real
>> data. While
>> dq2-get is fine for small amounts of data it becomes rather slow and
>> unreliable on this scale. It can take users days to download
>> their datasets
>> and even longer to check to make sure that all the files were
>> downloaded
>> correctly (and not duplicated!).
>
> Which is why the general grid data model is about moving compute to
> data whenever possible, rather than the converse.
> What, precisely, is the use case that *requires* users to move their
> data all to one location, effectively de-gridifying it?
> A lot of sites aren't going to be able to support many local users
> doing this and transferring considerable amounts of stuff to their
> localgroupdisk and scratchdisk tokens.
>
>>
>> The suggestion is to tell users who want to get large amounts of
>> user data
>> to submit a datri (data transfer) request to copy the data to the
>> tier 2
>> site they actually work at. If the site was to allow it they
>> could then
>> access the files directly from the storage element. For example
>> at RALPP.
>> I could request my dataset be moved to scratchdisk and then once
>> it was
>> there access it by looking in:
>> /pnfs/pp.rl.ac.uk/data/atlas/atlasscratchdisk/
>> I would use local dcache protocal to copy it out.
>>
>> The datri request has the advantage that it can be schedule,
>> should be more
>> efficient than a dq2-get command, will automatically retry
>> failures. Of
>> course for this to work the user would need to be able to access
>> the local
>> mass storage and it would be understandable if site admins didn't
>> want this.
>
> Well... they'll always have access to the local mass storage via the
> GridFTP transport at least, anyway, surely? It's a Grid accessible
> resource, and thus you can talk to it (at least) via anything that
> speaks SRM...
>
> I suspect that what you mean is "access the local mass storage via its
> local transport protocol"; this almost certainly shouldn't be a
> significant problem for the reason Stephen mentioned already.
>
> I wonder if, in most cases, though, the problem would be more solved
> in a more scalable manner by simply... distributing work over the
> grid?
>
> Sam
>
>> However I am also aware that at some sites, local users already have
>> limited access to the storage elements to give them somewhere to
>> store their
>> data offline.
>>
>> Comments?
>>
>> Alastair
>>
>>
|