Can others also check DPM head node, we're seeing srmv1 daemon going
nuts with Johannes' requests. I'm wondering if this is widespread. It
appears to be our bottleneck. I'm a little concerned because this is
causing srmv2.2 daemon to reject connections.
2008/11/25 Graeme Stewart <[log in to unmask]>:
> Hi Greig
>
> I just looked and the jobs are very poor in CPU efficiency (15-25%).
> Yes, the jobs were reading directly using rfio.
>
> Event/sec is one of the outputs you'll see in the final analysis.
>
> Although the DPM servers were crusing - low load, excellent data
> output rates, the headnode was suffering very high CPU load. This is
> surprising as the headnode should only be contacted for the open step
> and it hands off to the disk server.
>
> Puzzling...
>
> Graeme
>
> On Tue, Nov 25, 2008 at 3:33 PM, Greig A. Cowan <[log in to unmask]> wrote:
>> Hi Graeme,
>>
>> Do you have numbers for how CPU efficient these analysis jobs were? It will
>> be interesting to see how IO-bound they were. You're using rfio, right?
>>
>> Putting things into a physics perspective, it would also be interesting to
>> know how many events were processed by each analysis job per unit time.
>>
>> Anyway, looks like it's been a good exercise so far and the DPM disk servers
>> don't look all that loaded going by your ganglia.
>>
>> Cheers,
>> Greig
>>
>> Graeme Stewart wrote, On 25/11/08 13:11:
>>>
>>> Brian,
>>>
>>> Glasgow is here:
>>>
>>>
>>> http://svr031.gla.scotgrid.ac.uk/ganglia/?c=DPM%20Storage&m=&r=hour&s=by%20hostname&hc=4
>>>
>>> Preliminary results:
>>>
>>> "We had 40 jobs (just 40!) running on the
>>> cluster before lunch sucking data out of our DPM at ~600MB/s, which is
>>> 15MB/job (75Hz for 200kB AOD????).
>>>
>>> Currently we're running 85 jobs and hitting 1GB/s from our storage,
>>> which is about the limit (9 servers x 1Gb).
>>>
>>> This means we have saturated our i/o capacity with a cluster which is
>>> 15% full of analysis jobs.
>>>
>>> I think we need more network cards and bigger switches. I am astonished.
>>>
>>> Graeme"
>>>
>>>
>>> On Tue, Nov 25, 2008 at 9:37 AM, Davies, BGE (Brian)
>>> <[log in to unmask]> wrote:
>>>
>>>>
>>>> I have been collecting ganglia endpoints for those sites which publish
>>>> so as to be able to look at loads.
>>>> I have found all but Liverpool and RHUL for today's tests
>>>> Does anyone know have a link to these (If they are already in the gridpp
>>>> wiki then I can not find them...)
>>>> Brian
>>>>
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes
>>>> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>>>> Sent: 24 November 2008 21:27
>>>> To: [log in to unmask]
>>>> Subject: Re: Analysis challenge in the UK tomorrow
>>>>
>>>> On Mon, Nov 24, 2008 at 2:44 PM, Graeme Stewart
>>>> <[log in to unmask]> wrote:
>>>>
>>>>>
>>>>> Dear All
>>>>>
>>>>> We intend to start an ATLAS analysis challenge tomorrow at the
>>>>> following UK sites:
>>>>>
>>>>> UKI-LT2-RHUL
>>>>> UKI-NORTHGRID-LANCS-HEP
>>>>> UKI-NORTHGRID-LIV-HEP
>>>>> UKI-NORTHGRID-SHEF-HEP
>>>>> UKI-SCOTGRID-GLASGOW
>>>>> UKI-SOUTHGRID-OX-HEP
>>>>> UKI-SOUTHGRID-RALPP
>>>>>
>>>>> This will involve the submission of several hundred 'real' ATLAS
>>>>> analysis jobs via the WMS. We would kindly ask the sites to keep an
>>>>> eye on their systems during this test and report any problems they
>>>>> see. In particular we should like you to be alert for saturation of
>>>>> the network between your storage and the worker nodes. If you can grab
>>>>> any ganglia plots of activity or any other interesting metrics from
>>>>> your side we would be grateful.
>>>>>
>>>>> The jobs should be submitted in the morning (probably about 10am) but
>>>>> I will send another alert when this actually happens.
>>>>>
>>>>
>>>> Hi
>>>>
>>>> The jobs are set to go at 9am tomorrow (UK time), so gulp down that
>>>> coffee quickly :-)
>>>>
>>>> Dan has setup some trial monitoring here:
>>>>
>>>> http://gangarobot.cern.ch/st/
>>>>
>>>> where results will be posted as the jobs finish.
>>>>
>>>> Cheers
>>>>
>>>> Graeme
>>>>
>>>> --
>>>> Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
>>>> Department of Physics and Astronomy, University of Glasgow, Scotland
>>>> --
>>>> Scanned by iCritical.
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>
>
>
> --
> Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
> Department of Physics and Astronomy, University of Glasgow, Scotland
>
|