Ahh, you are right. I hadn't realised the monitoring was that detailed.
If you look at the plots for the UK sites with 20 jobs/site you will see
that on average, the jobs were 61.5% efficient. There is quite a spread
of efficiency depending on the site. It would be good to compare network
topologies and storage hardware for each of these sites.
Greig
Gordon, JC (John) wrote, On 25/11/08 14:41:
> ..... or rather, it would if there were any data there:-(
>
>
>> -----Original Message-----
>> From: Gordon, JC (John)
>> Sent: 25 November 2008 14:40
>> To: [log in to unmask]
>> Subject: RE: Analysis challenge in the UK tomorrow
>>
>> Greig, the gangarobot link Graeme sent shows job efficiency plots per
>> site.
>>
>>
>>> -----Original Message-----
>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>> [log in to unmask]] On Behalf Of Greig A. Cowan
>>> Sent: 25 November 2008 14:33
>>> To: [log in to unmask]
>>> Subject: Re: Analysis challenge in the UK tomorrow
>>>
>>> Hi Graeme,
>>>
>>> Do you have numbers for how CPU efficient these analysis jobs were?
>>>
>> It
>>
>>> will be interesting to see how IO-bound they were. You're using
>>>
> rfio,
>
>>> right?
>>>
>>> Putting things into a physics perspective, it would also be
>>>
>> interesting
>>
>>> to know how many events were processed by each analysis job per unit
>>> time.
>>>
>>> Anyway, looks like it's been a good exercise so far and the DPM disk
>>> servers don't look all that loaded going by your ganglia.
>>>
>>> Cheers,
>>> Greig
>>>
>>> Graeme Stewart wrote, On 25/11/08 13:11:
>>>
>>>> Brian,
>>>>
>>>> Glasgow is here:
>>>>
>>>>
>>>>
> http://svr031.gla.scotgrid.ac.uk/ganglia/?c=DPM%20Storage&m=&r=hour&s=b
>
>>> y%20hostname&hc=4
>>>
>>>> Preliminary results:
>>>>
>>>> "We had 40 jobs (just 40!) running on the
>>>> cluster before lunch sucking data out of our DPM at ~600MB/s,
>>>>
> which
>
>>> is
>>>
>>>> 15MB/job (75Hz for 200kB AOD????).
>>>>
>>>> Currently we're running 85 jobs and hitting 1GB/s from our
>>>>
> storage,
>
>>>> which is about the limit (9 servers x 1Gb).
>>>>
>>>> This means we have saturated our i/o capacity with a cluster which
>>>>
>> is
>>
>>>> 15% full of analysis jobs.
>>>>
>>>> I think we need more network cards and bigger switches. I am
>>>>
>>> astonished.
>>>
>>>> Graeme"
>>>>
>>>>
>>>> On Tue, Nov 25, 2008 at 9:37 AM, Davies, BGE (Brian)
>>>> <[log in to unmask]> wrote:
>>>>
>>>>
>>>>> I have been collecting ganglia endpoints for those sites which
>>>>>
>>> publish
>>>
>>>>> so as to be able to look at loads.
>>>>> I have found all but Liverpool and RHUL for today's tests
>>>>> Does anyone know have a link to these (If they are already in the
>>>>>
>>> gridpp
>>>
>>>>> wiki then I can not find them...)
>>>>> Brian
>>>>>
>>>>> -----Original Message-----
>>>>> From: Testbed Support for GridPP member institutes
>>>>> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>>>>> Sent: 24 November 2008 21:27
>>>>> To: [log in to unmask]
>>>>> Subject: Re: Analysis challenge in the UK tomorrow
>>>>>
>>>>> On Mon, Nov 24, 2008 at 2:44 PM, Graeme Stewart
>>>>> <[log in to unmask]> wrote:
>>>>>
>>>>>
>>>>>> Dear All
>>>>>>
>>>>>> We intend to start an ATLAS analysis challenge tomorrow at the
>>>>>> following UK sites:
>>>>>>
>>>>>> UKI-LT2-RHUL
>>>>>> UKI-NORTHGRID-LANCS-HEP
>>>>>> UKI-NORTHGRID-LIV-HEP
>>>>>> UKI-NORTHGRID-SHEF-HEP
>>>>>> UKI-SCOTGRID-GLASGOW
>>>>>> UKI-SOUTHGRID-OX-HEP
>>>>>> UKI-SOUTHGRID-RALPP
>>>>>>
>>>>>> This will involve the submission of several hundred 'real' ATLAS
>>>>>> analysis jobs via the WMS. We would kindly ask the sites to keep
>>>>>>
>> an
>>
>>>>>> eye on their systems during this test and report any problems
>>>>>>
>> they
>>
>>>>>> see. In particular we should like you to be alert for saturation
>>>>>>
>> of
>>
>>>>>> the network between your storage and the worker nodes. If you
>>>>>>
> can
>
>>> grab
>>>
>>>>>> any ganglia plots of activity or any other interesting metrics
>>>>>>
>> from
>>
>>>>>> your side we would be grateful.
>>>>>>
>>>>>> The jobs should be submitted in the morning (probably about
>>>>>>
> 10am)
>
>>> but
>>>
>>>>>> I will send another alert when this actually happens.
>>>>>>
>>>>>>
>>>>> Hi
>>>>>
>>>>> The jobs are set to go at 9am tomorrow (UK time), so gulp down
>>>>>
>> that
>>
>>>>> coffee quickly :-)
>>>>>
>>>>> Dan has setup some trial monitoring here:
>>>>>
>>>>> http://gangarobot.cern.ch/st/
>>>>>
>>>>> where results will be posted as the jobs finish.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Graeme
>>>>>
>>>>> --
>>>>> Dr Graeme Stewart
>>>>>
>> http://www.physics.gla.ac.uk/~graeme/
>>
>>>>> Department of Physics and Astronomy, University of Glasgow,
>>>>>
>> Scotland
>>
>>>>> --
>>>>> Scanned by iCritical.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
|