I would however get COD involved now (i.e. escalate the ticket),
before we get another "ROD not functioning" ticket.
Cheers,
Daniela
On 21 February 2012 11:41, Kashif Mohammad <[log in to unmask]> wrote:
> I am not in favour of John B suggestion of reserving some space at Head node or pool node. When OPS test copy a file then it goes randomly to any of the pool node available. Our intention should not be to just pass ops test but to check the functionality of the resource. I have seen so many cases where nagios srm test failed because host certificate at one of the pool node expired or CA certificate was not updated on one of the node. This test is very useful in detecting this kind of problem. I think underlying problem is that it is a Bug with DPM and it should be fixed. We can think of some workaround unless bug is fixed but this workaround should not be a permanent feature.
>
> Kashif
>
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:[log in to unmask]] On Behalf Of John Gordon
> Sent: 21 February 2012 11:11
> To: [log in to unmask]
> Subject: Re: Ticket summary - 20th Feb 12
>
> If this has been a long-standing DPM issue then I will ask to have this test (SRMput ?) removed from the SRMV2 set of tests so that it isn't included in availability. If it turns out that sites are being asked to make space available exclusively for OPS (Does Stephen's suggestion work?) then we should have that out in the open.
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Daniela Bauer
>> Sent: 21 February 2012 10:44
>> To: [log in to unmask]
>> Subject: Re: Ticket summary - 20th Feb 12
>>
>> The availability is unavoidable (let's face it, it won't go up much,
>> even if the site has 100% next week), I don't think they will change
>> the code just for us, but you could ask.
>>
>> Cheers,
>> Daniela
>>
>> On 21 February 2012 10:37, John Gordon <[log in to unmask]> wrote:
>> > ... and Manchester will have zero availability for this month.
>> >
>> >> -----Original Message-----
>> >> From: Testbed Support for GridPP member institutes [mailto:TB-
>> >> [log in to unmask]] On Behalf Of Daniela Bauer
>> >> Sent: 21 February 2012 10:06
>> >> To: [log in to unmask]
>> >> Subject: Re: Ticket summary - 20th Feb 12
>> >>
>> >> I (or whoever is on ROD duty) can close the ticket as unsolvable -
>> >> that will get us around the 30 day admin mess.
>> >>
>> >> The alarms will then come back though.
>> >>
>> >> Daniela
>> >>
>> >> On 21 February 2012 09:52, John Gordon <[log in to unmask]>
>> wrote:
>> >> > Sam, Alessandra, thanks for this info. Can you point me at any DPM
>> >> tickets
>> >> > in GGUS or Savannah about this bug?
>> >> >
>> >> >
>> >> >
>> >> > John
>> >> >
>> >> >
>> >> >
>> >> > From: Testbed Support for GridPP member institutes
>> >> > [mailto:[log in to unmask]] On Behalf Of Sam Skipsey
>> >> > Sent: 21 February 2012 07:55
>> >> >
>> >> >
>> >> > To: [log in to unmask]
>> >> > Subject: Re: Ticket summary - 20th Feb 12
>> >> >
>> >> >
>> >> >
>> >> > Hi John
>> >> >
>> >> > On 20 February 2012 23:25, John Gordon <[log in to unmask]>
>> >> wrote:
>> >> >
>> >> > Alessandra, is your 20TB general use actually full? - you say it
>> is
>> >> little
>> >> > used - or is the ops job requiring more than 20TB free?
>> >> >
>> >> >
>> >> >
>> >> > I should elucidate: Alessandra has some disk servers set Read only
>> or
>> >> > disabled. The disk servers which are still available have free
>> space
>> >> on
>> >> > them, and can be written to.
>> >> >
>> >> >
>> >> >
>> >> > Due to a quirk of DPM's space publishing, however, the "lost"
>> space
>> >> from
>> >> > filesystems that are read only is subtracted first from space
>> outside
>> >> of
>> >> > spacetokens (and then from spacetokens in a slightly weird
>> manner).
>> >> >
>> >> > This can lead to DPM publishing that there is "no free space" and
>> >> causing
>> >> > tests to fail, even though there is, in fact, space free on all
>> the
>> >> disks
>> >> > present which would be writable for that VO.
>> >> >
>> >> >
>> >> >
>> >> > This is a known issue that everyone with a DPM has probably
>> noticed,
>> >> and
>> >> > which Alessandra and I (amongst others) have repeatedly brought up
>> as
>> >> an
>> >> > improvement needed to DPM publishing. There have been some
>> >> improvements
>> >> > made, but never a fix to the central issue described here.
>> >> >
>> >> >
>> >> >
>> >> > It seems a little unreasonable that Manchester should be marked 0%
>> >> > unavailable due to a DPM quirk and an overly excitable nagios
>> test...
>> >> >
>> >> >
>> >> >
>> >> > Sam
>> >> >
>> >> >
>> >> >
>> >> > I'm happy to defend you and refuse to suspend you after 30 days
>> but
>> >> only if
>> >> > we have explored all avenues
>> >> >
>> >> >
>> >> > John
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Testbed Support for GridPP member institutes [mailto:TB-
>> >> >
>> >> >> [log in to unmask]] On Behalf Of Alessandra Forti
>> >> >> Sent: 20 February 2012 20:01
>> >> >> To: [log in to unmask]
>> >> >> Subject: Re: Ticket summary - 20th Feb 12
>> >> >>
>> >> >
>> >> >> On 20/02/2012 14:56, Daniela Bauer wrote:
>> >> >> >>> MANCHESTER:
>> >> >> >>> https://ggus.eu/ws/ticket_info.php?ticket=78776
>> >> >> >>> Ops tests failing due to "lack of space". Kashif mentions
>> >> ticketing
>> >> >> >>> the
>> >> >> dpm developers or somehow figuring out a workaround. Sounds like
>> we
>> >> need
>> >> >> to
>> >> >> refer it to the storage group.
>> >> >> > This is reasonably urgent, as a ticket from the ROD dashboard
>> >> cannot
>> >> >> > be extended beyond 30 days or so (i.e. during my ROD duty,
>> sigh)
>> >> >> > without drawing the ire of COD . Manchester currently shows at
>> 0%
>> >> >> > availability/reliability this month which almost certainly
>> >> requires
>> >> >> > some kind of formal explanation at the end of the month. It
>> would
>> >> be
>> >> >> > good if the site made some kind of statement in the ticket,
>> right
>> >> now
>> >> >> > it mainly consists of ROD entries, making it looked abandoned.
>> >> >> >
>> >> >> > Cheers,
>> >> >> > Daniela
>> >> >> >
>> >> >> The statement was made this morning on top of another number of
>> >> times in
>> >> >> the past two years.
>> >> >>
>> >> >> cheers
>> >> >> alessandra
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -----------------------------------------------------------
>> >> [log in to unmask]
>> >> HEP Group/Physics Dep
>> >> Imperial College
>> >> Tel: +44-(0)20-75947810
>> >> http://www.hep.ph.ic.ac.uk/~dbauer/
>>
>>
>>
>> --
>> -----------------------------------------------------------
>> [log in to unmask]
>> HEP Group/Physics Dep
>> Imperial College
>> Tel: +44-(0)20-75947810
>> http://www.hep.ph.ic.ac.uk/~dbauer/
--
-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/
|