Print

Print


Except we don't have a "spare" file system. We need to install the fs 
node on purpose for this using an old machine or I need to use the head 
node which I'm reluctant to do.

cheers
alessandra

On 21/02/2012 14:18, John Bland wrote:
> Hi,
>
> Take any spare FS (I mean anything with more than a few gig on it, 
> certainly not 60TB colossi) on a pool node or head node. Add it to a 
> new pool which is marked only for ops/sgmops. This is all you should 
> need to do, unless this bug is far worse than I think it is.
>
> There are certainly more elegant solutions but this should fix the 
> problem in 5mins.
>
> John
>
> On 21/02/2012 13:14, Alessandra Forti wrote:
>> The only people who can feign surprise are those who don't listen or who
>> forget.
>>
>> We never had more than one pool because as Kashif points out the 
>> writing is
>> random anyway. I'm not even sure if the solution proposed is real or 
>> if it
>> works because Glasgow has 4 times smaller fs and a larger common area.
>> Infact adding a new pool and adding a file system adds the whole 60TB 
>> to the
>> new pool which means removing it from the atlas pool. We can 
>> reinstall one
>> of the old DELL (~480GB) as a DPM fs but I'm not going to sacrifice more
>> than that to this.
>>
>> POOL atlas_pool DEFSIZE 20.00T GC_START_THRESH 0 GC_STOP_THRESH 0
>> DEF_LIFETIME 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h
>> FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 0 S_TYPE P
>> MIG_POLICY none RET_POLICY R
>> CAPACITY 604.95T FREE 0 ( 0.0%)
>> [.....]
>> se12.tier2.hep.manchester.ac.uk /raid CAPACITY 54.49T FREE 16.38T ( 
>> 30.1%)
>> [.....]
>> POOL ops_pool DEFSIZE 50.00G GC_START_THRESH 0 GC_STOP_THRESH 0 
>> DEF_LIFETIME
>> 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h FSS_POLICY
>> maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 104 S_TYPE - 
>> MIG_POLICY none
>> RET_POLICY R
>> CAPACITY 54.49T FREE 16.38T ( 30.1%)
>> se12.tier2.hep.manchester.ac.uk /raid/ops CAPACITY 54.49T FREE 16.38T 
>> ( 30.1%)
>>
>> BTW sites were still accused of "cheating" at the ops TEG for using
>> reservations to make ops test pass when clusters are full.
>>
>> cheers
>> alessandra
>>
>> On 21/02/2012 12:08, Daniela Bauer wrote:
>>> But the ops tests have been around for *ages* and the consequences
>>> known, so I don't think it'll suit us well to feign surprise right
>>> now. Just give ops 500 GB and be done with it.
>>>
>>> Daniela
>>>
>>> On 21 February 2012 12:05, Sam Skipsey<[log in to unmask]> wrote:
>>>>
>>>> On 21 February 2012 11:45, Stephen Burke<[log in to unmask]> 
>>>> wrote:
>>>>> Testbed Support for GridPP member institutes [mailto:TB-
>>>>>> [log in to unmask]] On Behalf Of John Gordon said:
>>>>>> If this has been a long-standing DPM issue then I will ask to 
>>>>>> have this
>>>>>> test (SRMput ?) removed from the SRMV2 set of tests so that it isn't
>>>>>> included in availability.
>>>>> Even if there really was no free space for ops, does that make the SE
>>>>> unavailable? Any VO may fill up its space, that doesn't mean the 
>>>>> site is
>>>>> broken. Probably the intention is that the test is just supposed 
>>>>> to verify
>>>>> the functionality and no-one has considered the possibility of it 
>>>>> being
>>>>> full. (CE tests are similar if the queues are full - there I think 
>>>>> most
>>>>> sites do have an explicit reservation just to let the ops tests run.)
>>>>>
>>>> This is a valid point, and what I was getting at with my nagios test
>>>> comment: the test doesn't test if the storage is available, it 
>>>> tests if ops
>>>> can write to the storage. (Now, obviously, there's a point at which 
>>>> you have
>>>> to consider that a test has to test *something*...). ATLAS, 
>>>> meanwhile, can
>>>> happily write to the storage; and even ops tests are happy talking 
>>>> to the
>>>> storage, and it is responding in a reasonable and sane way.
>>>>
>>>> I note that Manchester is an almost entirely ATLAS site. It seems 
>>>> reasonable
>>>> that their availability be determined by their being available for the
>>>> entities that they are supposed to be supporting in the main, surely?
>>>>
>>>> Sam
>>>>
>>>>> Stephen
>>>>
>>>
>>>
>
>