Hi,
Take any spare FS (I mean anything with more than a few gig on it, certainly
not 60TB colossi) on a pool node or head node. Add it to a new pool which is
marked only for ops/sgmops. This is all you should need to do, unless this
bug is far worse than I think it is.
There are certainly more elegant solutions but this should fix the problem
in 5mins.
John
On 21/02/2012 13:14, Alessandra Forti wrote:
> The only people who can feign surprise are those who don't listen or who
> forget.
>
> We never had more than one pool because as Kashif points out the writing is
> random anyway. I'm not even sure if the solution proposed is real or if it
> works because Glasgow has 4 times smaller fs and a larger common area.
> Infact adding a new pool and adding a file system adds the whole 60TB to the
> new pool which means removing it from the atlas pool. We can reinstall one
> of the old DELL (~480GB) as a DPM fs but I'm not going to sacrifice more
> than that to this.
>
> POOL atlas_pool DEFSIZE 20.00T GC_START_THRESH 0 GC_STOP_THRESH 0
> DEF_LIFETIME 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h
> FSS_POLICY maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 0 S_TYPE P
> MIG_POLICY none RET_POLICY R
> CAPACITY 604.95T FREE 0 ( 0.0%)
> [.....]
> se12.tier2.hep.manchester.ac.uk /raid CAPACITY 54.49T FREE 16.38T ( 30.1%)
> [.....]
> POOL ops_pool DEFSIZE 50.00G GC_START_THRESH 0 GC_STOP_THRESH 0 DEF_LIFETIME
> 7.0d DEFPINTIME 2.0h MAX_LIFETIME 1.0m MAXPINTIME 12.0h FSS_POLICY
> maxfreespace GC_POLICY lru RS_POLICY fifo GIDS 104 S_TYPE - MIG_POLICY none
> RET_POLICY R
> CAPACITY 54.49T FREE 16.38T ( 30.1%)
> se12.tier2.hep.manchester.ac.uk /raid/ops CAPACITY 54.49T FREE 16.38T ( 30.1%)
>
> BTW sites were still accused of "cheating" at the ops TEG for using
> reservations to make ops test pass when clusters are full.
>
> cheers
> alessandra
>
> On 21/02/2012 12:08, Daniela Bauer wrote:
>> But the ops tests have been around for *ages* and the consequences
>> known, so I don't think it'll suit us well to feign surprise right
>> now. Just give ops 500 GB and be done with it.
>>
>> Daniela
>>
>> On 21 February 2012 12:05, Sam Skipsey<[log in to unmask]> wrote:
>>>
>>> On 21 February 2012 11:45, Stephen Burke<[log in to unmask]> wrote:
>>>> Testbed Support for GridPP member institutes [mailto:TB-
>>>>> [log in to unmask]] On Behalf Of John Gordon said:
>>>>> If this has been a long-standing DPM issue then I will ask to have this
>>>>> test (SRMput ?) removed from the SRMV2 set of tests so that it isn't
>>>>> included in availability.
>>>> Even if there really was no free space for ops, does that make the SE
>>>> unavailable? Any VO may fill up its space, that doesn't mean the site is
>>>> broken. Probably the intention is that the test is just supposed to verify
>>>> the functionality and no-one has considered the possibility of it being
>>>> full. (CE tests are similar if the queues are full - there I think most
>>>> sites do have an explicit reservation just to let the ops tests run.)
>>>>
>>> This is a valid point, and what I was getting at with my nagios test
>>> comment: the test doesn't test if the storage is available, it tests if ops
>>> can write to the storage. (Now, obviously, there's a point at which you have
>>> to consider that a test has to test *something*...). ATLAS, meanwhile, can
>>> happily write to the storage; and even ops tests are happy talking to the
>>> storage, and it is responding in a reasonable and sane way.
>>>
>>> I note that Manchester is an almost entirely ATLAS site. It seems reasonable
>>> that their availability be determined by their being available for the
>>> entities that they are supposed to be supporting in the main, surely?
>>>
>>> Sam
>>>
>>>> Stephen
>>>
>>
>>
--
John Bland [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42911
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
University of Liverpool http://www.liv.ac.uk/physics/hep/
"I canna change the laws of physics, Captain!"
|