To work ops group must appear only in the new pool otherwise it picks
the first pool available in the BDII which last night was still the
atlas_pool. For some reason I always thought that groups could write in
any pool which is the reason we never used more than one pool.
I've inserted a slide about this in my talk for today.
cheers
alessandra
On 21/02/2012 14:21, Alessandra Forti wrote:
> Except we don't have a "spare" file system. We need to install the fs
> node on purpose for this using an old machine or I need to use the
> head node which I'm reluctant to do.
>
> cheers
> alessandra
>
> On 21/02/2012 14:18, John Bland wrote:
>> Hi,
>>
>> Take any spare FS (I mean anything with more than a few gig on it,
>> certainly not 60TB colossi) on a pool node or head node. Add it to a
>> new pool which is marked only for ops/sgmops. This is all you should
>> need to do, unless this bug is far worse than I think it is.
>>
>> There are certainly more elegant solutions but this should fix the
>> problem in 5mins.
>>
>> John
>>
>> On 21/02/2012 13:14, Alessandra Forti wrote:
>>> The only people who can feign surprise are those who don't listen or
>>> who
>>> forget.
>>>
>>> We never had more than one pool because as Kashif points out the
>>> writing is
>>> random anyway. I'm not even sure if the solution proposed is real or
>>> if it
>>> works because Glasgow has 4 times smaller fs and a larger common area.
>>> Infact adding a new pool and adding a file system adds the whole
>>> 60TB to the
>>> new pool which means removing it from the atlas pool. We can
>>> reinstall one
>>> of the old DELL (~480GB) as a DPM fs but I'm not going to sacrifice
>>> more
>>> than that to this.
>>>
>>> POOL atlas_pool DEFSIZE 20.00T GC_START_THRESH 0 GC_STOP_THRESH 0
>>> DEF_LIFETIME 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h
>>> FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 0 S_TYPE P
>>> MIG_POLICY none RET_POLICY R
>>> CAPACITY 604.95T FREE 0 ( 0.0%)
>>> [.....]
>>> se12.tier2.hep.manchester.ac.uk /raid CAPACITY 54.49T FREE 16.38T (
>>> 30.1%)
>>> [.....]
>>> POOL ops_pool DEFSIZE 50.00G GC_START_THRESH 0 GC_STOP_THRESH 0
>>> DEF_LIFETIME
>>> 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h FSS_POLICY
>>> maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 104 S_TYPE -
>>> MIG_POLICY none
>>> RET_POLICY R
>>> CAPACITY 54.49T FREE 16.38T ( 30.1%)
>>> se12.tier2.hep.manchester.ac.uk /raid/ops CAPACITY 54.49T FREE
>>> 16.38T ( 30.1%)
>>>
>>> BTW sites were still accused of "cheating" at the ops TEG for using
>>> reservations to make ops test pass when clusters are full.
>>>
>>> cheers
>>> alessandra
>>>
>>> On 21/02/2012 12:08, Daniela Bauer wrote:
>>>> But the ops tests have been around for *ages* and the consequences
>>>> known, so I don't think it'll suit us well to feign surprise right
>>>> now. Just give ops 500 GB and be done with it.
>>>>
>>>> Daniela
>>>>
>>>> On 21 February 2012 12:05, Sam Skipsey<[log in to unmask]> wrote:
>>>>>
>>>>> On 21 February 2012 11:45, Stephen Burke<[log in to unmask]>
>>>>> wrote:
>>>>>> Testbed Support for GridPP member institutes [mailto:TB-
>>>>>>> [log in to unmask]] On Behalf Of John Gordon said:
>>>>>>> If this has been a long-standing DPM issue then I will ask to
>>>>>>> have this
>>>>>>> test (SRMput ?) removed from the SRMV2 set of tests so that it
>>>>>>> isn't
>>>>>>> included in availability.
>>>>>> Even if there really was no free space for ops, does that make
>>>>>> the SE
>>>>>> unavailable? Any VO may fill up its space, that doesn't mean the
>>>>>> site is
>>>>>> broken. Probably the intention is that the test is just supposed
>>>>>> to verify
>>>>>> the functionality and no-one has considered the possibility of it
>>>>>> being
>>>>>> full. (CE tests are similar if the queues are full - there I
>>>>>> think most
>>>>>> sites do have an explicit reservation just to let the ops tests
>>>>>> run.)
>>>>>>
>>>>> This is a valid point, and what I was getting at with my nagios test
>>>>> comment: the test doesn't test if the storage is available, it
>>>>> tests if ops
>>>>> can write to the storage. (Now, obviously, there's a point at
>>>>> which you have
>>>>> to consider that a test has to test *something*...). ATLAS,
>>>>> meanwhile, can
>>>>> happily write to the storage; and even ops tests are happy talking
>>>>> to the
>>>>> storage, and it is responding in a reasonable and sane way.
>>>>>
>>>>> I note that Manchester is an almost entirely ATLAS site. It seems
>>>>> reasonable
>>>>> that their availability be determined by their being available for
>>>>> the
>>>>> entities that they are supposed to be supporting in the main, surely?
>>>>>
>>>>> Sam
>>>>>
>>>>>> Stephen
>>>>>
>>>>
>>>>
>>
>>
|