> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
>
> Testbed Support for GridPP member institutes
> > [mailto:[log in to unmask]] On Behalf Of Ewan MacMahon said:
> > The question is what to do about it, and so far we seem to have:
> > - Don't run many of these jobs on one node,
> > - SSDs (ha; as if),
> > - Go back to direct rfio access to the SEs, but get it right
> > this time.
>
> More than one disk per WN?
>
Possible, but probably not going to happen. Retrofitting anything is a
pain in the neck, it costs money, a lot of WNs don't have many bays
(the Twins only have two, for example), it wouldn't help all that much.
A
pair would be better than one, but only a bit, and even assuming that
you could fit more than that in a node you start getting towards needing
raid cards, and at that point you're back to SSD money.
> xrootd?
>
Very possible, but someone needs to pick up what Greig was doing before
he left. For most of the UK an xrootd enabled DPM would probably be a
fairly simple upgrade.
> lustre et al?
>
I suspect that a well configured lustre will be good thing, but I don't
see a way of migrating the existing DPM sites to that in time to make
much difference; at least not for the upcoming run.
We were discussing this briefly at the storage meeting on Wednesday, and
I think one upshot was that it would be good to make an effort to get
all the DPM sites configured to use very small RFIO buffers, then
run a hammercloud test using direct access and see what happens - AIUI a
lot of the previous attempts were run using larger buffers (including
the
default) which chewed up excessive network bandwidth.
Ewan
|