An issue wqhich also needs top be loked into is the amount of memory
available. When ATLAS files were small, then the entire file would be
cached in memory and so disk IO wasx greatly reduced. Now that the
Input files are large and the number of jobs is increasing, this is
not the case , hence why we now are getting local system disk IO wait.
This is why this analysis is a calculation that needs to be done for
each site. We can probably come up with generalisations on factors
which might effect a sites performance, but at times it is only the
site itself which can calculate which of these factors effect it the
most.
Brian
2009/8/10 Sam Skipsey <[log in to unmask]>:
> Ah, I'm glad (in a way) to see that you didn't know any cleverer way
> of doing this - we were looking at doing the same at Glasgow, but the
> horror of maintaining a maui.cfg with 300-odd NODECFG lines in it
> rather put us off.
> That said, we've been Learning Clever Things about maui recently, so...
>
> Sam
>
> 2009/8/7 John Bland <[log in to unmask]>:
>> Ewan MacMahon wrote:
>>>>
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>>> [log in to unmask]] On Behalf Of John Bland
>>>>
>>>> Liverpool has only a small number of multi core nodes atm so we've
>>>
>>> gone
>>>>
>>>> about things a little more accurately. We have restricted atlas pilot
>>>> jobs to only run on our multi core nodes, beginning with 1 job per
>>>
>>> node
>>>>
>>>> (maui rule for each individual node).
>>>>
>>> That sounds clever - AIUI most of us have just been limiting the number
>>> over the whole site and trusting to randomness. Would you mind posting the
>>> relevant bits of your maui config?
>>
>> No cleverness involved, I'm afraid, we only have 7 of these nodes om the
>> Tier2 so manually limit each node to X jobs with
>>
>> # Reservation for Hammercloud test
>> #
>> SRCFG[hc_test_02] HOSTLIST=r16
>> SRCFG[hc_test_02] GROUPLIST=atlaspil
>> SRCFG[hc_test_02] PERIOD=INFINITY
>> QOSCFG[hc_test_02only] QFLAGS=USERESERVED:hc_test_02
>> GROUPCFG[atlaspil] QDEF=hc_test_02only
>> NODECFG[r16-n01.ph.liv.ac.uk] MAXJOB=1
>> NODECFG[r16-n02.ph.liv.ac.uk] MAXJOB=1
>> NODECFG[r16-n03.ph.liv.ac.uk] MAXJOB=1
>> NODECFG[r16-n04.ph.liv.ac.uk] MAXJOB=1
>> NODECFG[r16-n05.ph.liv.ac.uk] MAXJOB=1
>> NODECFG[r16-n06.ph.liv.ac.uk] MAXJOB=1
>> NODECFG[r16-n07.ph.liv.ac.uk] MAXJOB=1
>>
>> Then update value of MAXJOB every Y hours and restart maui.
>>
>> John
>>
>> --
>> Dr John Bland, Systems Administrator
>> Room 220, Oliver Lodge
>> Particle Physics Group, University of Liverpool
>> Mail: [log in to unmask]
>> Tel : 0151 794 2911
>> "I canna change the laws of physics, Captain!"
>>
>
|