The load average on the RHUL DPM head node went up to about 10 between
the hours of 1000 and 1200, now it is quiet again.
Duncan
Peter Love wrote:
> Can others also check DPM head node, we're seeing srmv1 daemon going
> nuts with Johannes' requests. I'm wondering if this is widespread. It
> appears to be our bottleneck. I'm a little concerned because this is
> causing srmv2.2 daemon to reject connections.
>
>
> 2008/11/25 Graeme Stewart <[log in to unmask]>:
>> Hi Greig
>>
>> I just looked and the jobs are very poor in CPU efficiency (15-25%).
>> Yes, the jobs were reading directly using rfio.
>>
>> Event/sec is one of the outputs you'll see in the final analysis.
>>
>> Although the DPM servers were crusing - low load, excellent data
>> output rates, the headnode was suffering very high CPU load. This is
>> surprising as the headnode should only be contacted for the open step
>> and it hands off to the disk server.
>>
>> Puzzling...
>>
>> Graeme
>>
>> On Tue, Nov 25, 2008 at 3:33 PM, Greig A. Cowan <[log in to unmask]> wrote:
>>> Hi Graeme,
>>>
>>> Do you have numbers for how CPU efficient these analysis jobs were? It will
>>> be interesting to see how IO-bound they were. You're using rfio, right?
>>>
>>> Putting things into a physics perspective, it would also be interesting to
>>> know how many events were processed by each analysis job per unit time.
>>>
>>> Anyway, looks like it's been a good exercise so far and the DPM disk servers
>>> don't look all that loaded going by your ganglia.
>>>
>>> Cheers,
>>> Greig
>>>
>>> Graeme Stewart wrote, On 25/11/08 13:11:
>>>> Brian,
>>>>
>>>> Glasgow is here:
>>>>
>>>>
>>>> http://svr031.gla.scotgrid.ac.uk/ganglia/?c=DPM%20Storage&m=&r=hour&s=by%20hostname&hc=4
>>>>
>>>> Preliminary results:
>>>>
>>>> "We had 40 jobs (just 40!) running on the
>>>> cluster before lunch sucking data out of our DPM at ~600MB/s, which is
>>>> 15MB/job (75Hz for 200kB AOD????).
>>>>
>>>> Currently we're running 85 jobs and hitting 1GB/s from our storage,
>>>> which is about the limit (9 servers x 1Gb).
>>>>
>>>> This means we have saturated our i/o capacity with a cluster which is
>>>> 15% full of analysis jobs.
>>>>
>>>> I think we need more network cards and bigger switches. I am astonished.
>>>>
>>>> Graeme"
>>>>
>>>>
>>>> On Tue, Nov 25, 2008 at 9:37 AM, Davies, BGE (Brian)
>>>> <[log in to unmask]> wrote:
>>>>
>>>>> I have been collecting ganglia endpoints for those sites which publish
>>>>> so as to be able to look at loads.
>>>>> I have found all but Liverpool and RHUL for today's tests
>>>>> Does anyone know have a link to these (If they are already in the gridpp
>>>>> wiki then I can not find them...)
>>>>> Brian
>>>>>
>>>>> -----Original Message-----
>>>>> From: Testbed Support for GridPP member institutes
>>>>> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>>>>> Sent: 24 November 2008 21:27
>>>>> To: [log in to unmask]
>>>>> Subject: Re: Analysis challenge in the UK tomorrow
>>>>>
>>>>> On Mon, Nov 24, 2008 at 2:44 PM, Graeme Stewart
>>>>> <[log in to unmask]> wrote:
>>>>>
>>>>>> Dear All
>>>>>>
>>>>>> We intend to start an ATLAS analysis challenge tomorrow at the
>>>>>> following UK sites:
>>>>>>
>>>>>> UKI-LT2-RHUL
>>>>>> UKI-NORTHGRID-LANCS-HEP
>>>>>> UKI-NORTHGRID-LIV-HEP
>>>>>> UKI-NORTHGRID-SHEF-HEP
>>>>>> UKI-SCOTGRID-GLASGOW
>>>>>> UKI-SOUTHGRID-OX-HEP
>>>>>> UKI-SOUTHGRID-RALPP
>>>>>>
>>>>>> This will involve the submission of several hundred 'real' ATLAS
>>>>>> analysis jobs via the WMS. We would kindly ask the sites to keep an
>>>>>> eye on their systems during this test and report any problems they
>>>>>> see. In particular we should like you to be alert for saturation of
>>>>>> the network between your storage and the worker nodes. If you can grab
>>>>>> any ganglia plots of activity or any other interesting metrics from
>>>>>> your side we would be grateful.
>>>>>>
>>>>>> The jobs should be submitted in the morning (probably about 10am) but
>>>>>> I will send another alert when this actually happens.
>>>>>>
>>>>> Hi
>>>>>
>>>>> The jobs are set to go at 9am tomorrow (UK time), so gulp down that
>>>>> coffee quickly :-)
>>>>>
>>>>> Dan has setup some trial monitoring here:
>>>>>
>>>>> http://gangarobot.cern.ch/st/
>>>>>
>>>>> where results will be posted as the jobs finish.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Graeme
>>>>>
>>>>> --
>>>>> Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
>>>>> Department of Physics and Astronomy, University of Glasgow, Scotland
>>>>> --
>>>>> Scanned by iCritical.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>
>>
>> --
>> Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
>> Department of Physics and Astronomy, University of Glasgow, Scotland
>>
|