Answers inline...
On 5 Jul 2010, at 14:51, Alessandra Forti wrote:
> Hi Alastair,
>
> If you are looking at group production roles these should be fully
> integrated in the atlas framework and their output shouldn't go to
> scratch disk but to their group space tokens.
>
> If they are normal users they shouldn't produce so much output and
> if they do the jobs should still go to the data until the sample is
> small enough. For the final ntuples analysis they might want it at
> their Tier3 where their can run interactively (which is what you
> call "locally on 1 machine"). Tier3 don't have mandatory grid
> enabled storage.
>
> cheers
> alessandra
This is not really true. Alot of group production that is done is no
different from user analysis. Its just done by someone on behalf of
a physics group who uses a production certificate to increase their
priority. It will be written to scratchdisk and then they will
decide if they want to copy it to a groupdisk.
On 5 Jul 2010, at 14:52, Sam Skipsey wrote:
> On 5 July 2010 14:34, Alastair Dewhurst <[log in to unmask]>
> wrote:
>> Hi
>>
>> Alessandra: These would be T1/2 -> T2 data transfers requested by
>> users. It
>> could in future be extended to T3 sites. Does that answer your
>> query?
>>
>> Sam: There are two reasons, firstly users need to copy their data
>> once it is
>> produced (on scratchdisk) somewhere safer if they want to keep it.
>> (scratchdisk deletes data automatically after 30 days) Putting it
>> all
>> together at their home institute seems reasonable.
>
> Indeed, I know about the expiry time for scratchdisk - in fact, it
> will probably be less than 30 days in future. I'm not convinced that
> the correct "semi-archival" destination is always the home institute,
> however. I suppose it depends on how likely they are to want to use
> the data in future, and in what way.
>
>> Secondly at some point
>> they will need to run on their data locally to do fine tuning of
>> their
>> cuts/analysis and produce plots. You are right that if they have
>> produced a
>> large amount of data that they should use tools such as prun
>> which will
>> allow them ability to run root/other scripts on their data but at
>> some point
>> it will be much faster to do this locally on 1 machine.
>
> Indeed. However, the trend in the wider world is for *more* remote
> distribution of data, not more consolidation. It would be simpler to
> provide one way of doing fine tuning - via prun, etc - that scales to
> large datasets (which seem likely to be the increasing case) than two
> methods, one of which is only usable locally and will break for users
> who graduate to more data.
>
All users are encouraged to send their jobs to data until the data is
small enough. However we have defined that small enough = 10GB a
day. As an example: I tried to download datasets that were around
5GB in size. It took 3 days using dq2-get. A datri request took 24
hours. dq2-get is very inefficient compared to a datri request for
moving data so if you want anything more than a trivial amount of
data a datri request is best.
>> Where we draw the
>> line is a matter of debate but if a user can fit all their data on
>> their
>> laptop, some of them will try it. I think Roger Jones said in a
>> talk that:
>> Users are expect to download upto 10GB, if they download ~100GB a
>> day their
>> rate will be throttled and if they download ~1TB a day they will get
>> throttled! So the case we are looking at here is the heavy user
>> (possibly
>> because of a group production role).
>>
>
> Right. And heavy users should, for the good of everyone else, do their
> work on large datasets in a sensible, civilised manner - not by
> forcing it all into one local area.
>
>> I also think that for most people the total size of their files is
>> quite
>> small (~a few GB) however with real data they are producing alot
>> of files
>> even if they are only of the order of a few kb each. Its not always
>> possible for the user to merge their files on the grid before
>> downloading
>> them and In this case a dq2-get command is still inefficient
>> compared to a
>> datri request.
>
> Small file transfers are inefficient via lcg-cp anyway, if they're on
> the order of kb per file. (But this is outside the scope of the
> original topic.)
They are even less efficient via dq2-get though so its still an
improvement.
>
>> If you check
>> http://www.hep.lancs.ac.uk/~love/ukdata/token/ATLASLOCALGROUPDISK/
>> its clear that there is currently alot of unused space.
>>
>> When I said:
>> "The user would need access to the local mass storage"
>> I was actually meaning to add, via the local transport protocol.
>> While I
>> assumed it was possible for this to be done, I thought it best to
>> ask incase
>> there was something I had overlooked.
>>
>
> No, I think that's fairly unobjectionable.
> That said, I still think it would be *preferable* for the user to tune
> their data using their local CE, even if all the data is local and
> accessible via laptop etc.
>
> Sam
>
>
>>
>> Stephen: With regards to the space token, this is something I am
>> discussing
>> with ADC. In the US, where they have many universities that
>> aren't tier 2
>> sites, they set up a single space token similar to our
>> localgroupdisk which
>> they can then copy whatever data they want too. The site rather
>> than the US
>> cloud is responsible for deleting stuff. Because alot of our
>> universities
>> are Tier 2s we could choose to use the existing scratchdisk or
>> localgroupdisk if we wanted or set up a new space token. There are
>> advantages and disadvantages in each but it mostly comes down to
>> who is in
>> charge of deleting stuff on the space token.
>>
>> Alastair
>>
>>
>>
>> On 5 Jul 2010, at 13:30, Sam Skipsey wrote:
>>
>>> On 5 July 2010 12:26, Alastair Dewhurst
>>> <[log in to unmask]> wrote:
>>>>
>>>> Hi
>>>>
>>>> The following is an idea in the early stage of development,
>>>> thoughts
>>>> would
>>>> be welcome.
>>>>
>>>> Currently if ATLAS users wish to get data that has been produced
>>>> on the
>>>> grid
>>>> they use dq2-get (this basically just looks up where ATLAS
>>>> thinks the
>>>> file
>>>> is and does a lcg-cp command). This is fine if the user has a
>>>> small
>>>> amount
>>>> of data and a small number of files. However it is becoming
>>>> apparent
>>>> that
>>>> users are producing quite large sets of output files (~1000
>>>> files and
>>>> ~100GB). This is only likely to get worse as we get more real
>>>> data.
>>>> While
>>>> dq2-get is fine for small amounts of data it becomes rather slow
>>>> and
>>>> unreliable on this scale. It can take users days to download their
>>>> datasets
>>>> and even longer to check to make sure that all the files were
>>>> downloaded
>>>> correctly (and not duplicated!).
>>>
>>> Which is why the general grid data model is about moving compute to
>>> data whenever possible, rather than the converse.
>>> What, precisely, is the use case that *requires* users to move their
>>> data all to one location, effectively de-gridifying it?
>>> A lot of sites aren't going to be able to support many local users
>>> doing this and transferring considerable amounts of stuff to their
>>> localgroupdisk and scratchdisk tokens.
>>>
>>>>
>>>> The suggestion is to tell users who want to get large amounts of
>>>> user
>>>> data
>>>> to submit a datri (data transfer) request to copy the data to
>>>> the tier 2
>>>> site they actually work at. If the site was to allow it they
>>>> could then
>>>> access the files directly from the storage element. For example at
>>>> RALPP.
>>>> I could request my dataset be moved to scratchdisk and then
>>>> once it was
>>>> there access it by looking in:
>>>> /pnfs/pp.rl.ac.uk/data/atlas/atlasscratchdisk/
>>>> I would use local dcache protocal to copy it out.
>>>>
>>>> The datri request has the advantage that it can be schedule,
>>>> should be
>>>> more
>>>> efficient than a dq2-get command, will automatically retry
>>>> failures. Of
>>>> course for this to work the user would need to be able to access
>>>> the
>>>> local
>>>> mass storage and it would be understandable if site admins
>>>> didn't want
>>>> this.
>>>
>>> Well... they'll always have access to the local mass storage via the
>>> GridFTP transport at least, anyway, surely? It's a Grid accessible
>>> resource, and thus you can talk to it (at least) via anything that
>>> speaks SRM...
>>>
>>> I suspect that what you mean is "access the local mass storage
>>> via its
>>> local transport protocol"; this almost certainly shouldn't be a
>>> significant problem for the reason Stephen mentioned already.
>>>
>>> I wonder if, in most cases, though, the problem would be more solved
>>> in a more scalable manner by simply... distributing work over the
>>> grid?
>>>
>>> Sam
>>>
>>>> However I am also aware that at some sites, local users already
>>>> have
>>>> limited access to the storage elements to give them somewhere to
>>>> store
>>>> their
>>>> data offline.
>>>>
>>>> Comments?
>>>>
>>>> Alastair
>>>>
>>>>
>>
>>
|