The only thing I could see that was different between Glasgow and
Durham was that Glasgow's servers were using certificates signed by
the older CA; Durham's by the new. These will have different CRLs,
which might explain a different behaviour?
But as Phil said, we were changing things on the CE and then suddenly
it started to work - what exactly caused the problems to dissapear we
can't say (not if it was anything which we did...)
Cheers
Graeme
On Tue, May 20, 2008 at 6:51 PM, Kelsey, DP (David) <[log in to unmask]> wrote:
> Graeme, Simon,
>
> Is it now understood why Durham and RHUL were experiencing problems and
> how it was fixed?
>
> Dave
>
>
> ------------------------------------------------
> Dr David Kelsey
> Particle Physics Department
> Rutherford Appleton Laboratory
> Chilton, DIDCOT, OX11 0QX, UK
>
> e-mail: [log in to unmask]
> Tel: [+44](0)1235 445746 (direct)
> Fax: [+44](0)1235 446733
> ------------------------------------------------
>
>
>
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes
>> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>> Sent: 19 May 2008 23:29
>> To: [log in to unmask]
>> Subject: Re: New LCG CA release 1.21: breaks site
>>
>> Hi John
>>
>> That was a RAID failure on their SE - not related.
>>
>> Having forced a CRL update across the Durham cluster they are
>> still failing SAM tests, so we don't seem to be out of the
>> woods yet...
>>
>> g
>>
>> On Mon, May 19, 2008 at 11:11 PM, Gordon, JC (John)
>> <[log in to unmask]> wrote:
>> > Graeme, do we know that this was CA related? Durham were faiiing
>> > overnight Sunday too.
>> >
>> > John
>> >
>> >> -----Original Message-----
>> >> From: Testbed Support for GridPP member institutes
>> >> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>> >> Sent: 19 May 2008 22:00
>> >> To: [log in to unmask]
>> >> Subject: Re: New LCG CA release 1.21: breaks site
>> >>
>> >> On Mon, May 19, 2008 at 8:54 PM, Jensen, J (Jens)
>> <[log in to unmask]>
>> >> wrote:
>> >> > Hi Graeme,
>> >> >
>> >> > I know for the Moz NSS bug, it is because as part of the SSL
>> >> > negotiation, the server (or client, doesn't matter) sends
>> >> its trusted
>> >> > certificates to the peer saying "look this is my cert" and
>> >> the peer says "wot? I thought it looked like this?"
>> >> >
>> >> > But OpenSSL and stuff derived from OpenSSL does not work
>> like this;
>> >> > they may or may not send intermediate certificates in the
>> >> negotiation
>> >> > but all that matters is that the trust chain can be built,
>> >> which of course they can be either way.
>> >> >
>> >> > Maybe it's something more obvious. Like CRLs that haven't been
>> >> > refreshed when you install the 1.21 release. You folk in
>> >> Glasgow have
>> >> > probably been Good Eggs(tm) as usual and refreshed your CRLs.
>> >>
>> >> I upgraded one UI first (not our main one) and checked
>> that fetch-crl
>> >> worked - so that there was nothing basically wrong with the CA
>> >> release. Then, after I had upgraded the CE I refreshed the CRLs by
>> >> hand. Because of the way our site infrastruture works all
>> the other
>> >> machines then copy their CRLs from the CE (via a simple
>> mirror - no
>> >> complicated SSL thingamybobs...).
>> >>
>> >> I can actually tell when Durham broke from the ATLAS pilot
>> submission
>> >> logs:
>> >>
>> >> http://svr017.gla.scotgrid.ac.uk/factory/logs/2008-05-19/ce01.
>> >> dur.scotgrid.ac.uk_2119_jobmanager-lcgpbs-q3d/SubmissionLog
>> >>
>> >> I should say they broke for my submission before I had touched
>> >> anything at Glasgow re. the update.
>> >>
>> >> I now see a very weird effect. I can globus job run from
>> one Glasgow
>> >> UI to Durham ok, but not from the other...
>> >>
>> >> g
>> >>
>> >
>>
>
|