Sam Skipsey wrote:
> 2009/6/9 Ewan MacMahon <[log in to unmask]>:
>>> -----Original Message-----
>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>>
>>> Gentlepersons,
>> <huge snip>
>>> Of course, this would be even more useful if other sites (UK for
>>> starters) could do something similar, so we could compare data across
>>> storage and cluster implementations too.
>>>
>> It sounds like you're having a similar experience to us, but you're a
>> bit further ahead; I'd expect that we'll be following shortly behind.
>>
>> One thing I don't understand is quite what the difference between the
>> current batch of WMS jobs and those we've seen in previous hammercould
>> tests is - we're seeing completely different usage patterns with the
>> bottleneck being very definitely the DPM disk servers (and their network
>> links), whereas before we were being limited by the rate of
>> authorisations
>> going through the DPM head node. Is this just the result of the recent
>> packing together of data into fewer larger files, or something else?
>>
>
> Mostly the former. The ratio of transfer time to processing time is
> much better with the merged AODs.
Unfortunately the ratio of data processing to shifting data around on
LAN or disk is much worse as files on WNs no longer fit in rfio buffers
or node page cache and so we're being limited by LAN bandwidth (rfio) or
disk IOPS rather than RAM latency (file stager).
The main limit we're seeing at Liverpool (at about 100 rfio connections
on each server for a max of ~700 connections) is just plain bandwidth
(we have turned down rfio buffers to 32/64MB to keep RAM usage on pools
sensible).
The rfio processes are sitting around so much because we've got 100 rfio
processes and 350MB/s of bandwidth on a pool, that's only a max of
3.5MB/s per process. With these big files that's a drop in the ocean
(roughly 12 rfio connections can saturate one of our 3Gb/s pools), hence
efficiencies are through the floor.
At the same time we've got local user analysis going on. With these same
saturated pool nodes they're using file stager, and getting far more
useful work done.[1] If we're reading all of the file why are we using
rfio when AFAICT file stager is miles more efficient for that work flow
with these size files (smaller files too, IIRC) and the available
bandwidth at sites? Are STEP09 tests using/going to use file stager
(maybe our usage is skewed due to our software install problems)?
John
[1] rfio and file stager run in parallel on same cluster; file stagers
had finished before rfio had barely started.
--
Dr John Bland, Systems Administrator
Room 220, Oliver Lodge
Particle Physics Group, University of Liverpool
Mail: [log in to unmask]
Tel : 0151 794 2911
"I canna change the laws of physics, Captain!"
|