Last night at RHUL I dutifully (foolishly?) updated both CA rpms and
CRLs. I been failing tests ever since, so my cluster spent all last
night and this morning in the woods. I'm hoping someone will find the
way out soon.
Simon
Graeme Stewart wrote:
> Hi John
>
> That was a RAID failure on their SE - not related.
>
> Having forced a CRL update across the Durham cluster they are still
> failing SAM tests, so we don't seem to be out of the woods yet...
>
> g
>
> On Mon, May 19, 2008 at 11:11 PM, Gordon, JC (John) <[log in to unmask]> wrote:
>> Graeme, do we know that this was CA related? Durham were faiiing
>> overnight Sunday too.
>>
>> John
>>
>>> -----Original Message-----
>>> From: Testbed Support for GridPP member institutes
>>> [mailto:[log in to unmask]] On Behalf Of Graeme Stewart
>>> Sent: 19 May 2008 22:00
>>> To: [log in to unmask]
>>> Subject: Re: New LCG CA release 1.21: breaks site
>>>
>>> On Mon, May 19, 2008 at 8:54 PM, Jensen, J (Jens)
>>> <[log in to unmask]> wrote:
>>>> Hi Graeme,
>>>>
>>>> I know for the Moz NSS bug, it is because as part of the SSL
>>>> negotiation, the server (or client, doesn't matter) sends
>>> its trusted
>>>> certificates to the peer saying "look this is my cert" and
>>> the peer says "wot? I thought it looked like this?"
>>>> But OpenSSL and stuff derived from OpenSSL does not work like this;
>>>> they may or may not send intermediate certificates in the
>>> negotiation
>>>> but all that matters is that the trust chain can be built,
>>> which of course they can be either way.
>>>> Maybe it's something more obvious. Like CRLs that haven't been
>>>> refreshed when you install the 1.21 release. You folk in
>>> Glasgow have
>>>> probably been Good Eggs(tm) as usual and refreshed your CRLs.
>>> I upgraded one UI first (not our main one) and checked that
>>> fetch-crl worked - so that there was nothing basically wrong
>>> with the CA release. Then, after I had upgraded the CE I
>>> refreshed the CRLs by hand. Because of the way our site
>>> infrastruture works all the other machines then copy their
>>> CRLs from the CE (via a simple mirror - no complicated SSL
>>> thingamybobs...).
>>>
>>> I can actually tell when Durham broke from the ATLAS pilot
>>> submission logs:
>>>
>>> http://svr017.gla.scotgrid.ac.uk/factory/logs/2008-05-19/ce01.
>>> dur.scotgrid.ac.uk_2119_jobmanager-lcgpbs-q3d/SubmissionLog
>>>
>>> I should say they broke for my submission before I had
>>> touched anything at Glasgow re. the update.
>>>
>>> I now see a very weird effect. I can globus job run from one
>>> Glasgow UI to Durham ok, but not from the other...
>>>
>>> g
>>>
|