On 10/13/2011 05:58 PM, Maarten Litmaath wrote:
> Hi John,
>
>> For the last week or two we have had intermittent problems with (ATLAS)
>> jobs failing with what seems to be lfc connection problems
>>
>> Some logfile extracts follow:
>>
>> """
>> Failed to get LFC replicas: -1 (lfc_getreplicas failed with: 2704, Bad
>> magic number)
>> """
>>
>> and
>>
>> """
>> 13 Oct 07:27:03| lcgcpSiteMov| !!WARNING!!2990!! LFC setup and mkdir
>> failed. Status=256 Output=LFC_HOST=atlas-lfc-fzk.gridka.de
>> send2nsd: NS002 - send error : _Csec_recv_token: Received magic: 30e1301
>> expecting ca03
>> send2nsd: NS002 - send error : _Csec_recv_token: Received magic: 30e1301
>> expecting ca03
>> """
>>
>> We've tested locally and so far I cannot recreate the problem.
>>
>> I've done lfc-ls and lfc-mkdir and I've ran lfc-ls basically
>> simultaneously on all of our nodes and I didn't see the problem.
>>
>> I just set CSEC_TRACE=1 on a bunch of our nodes to see if we can catch
>> the problem and get more info..
>>
>> google managed to give me some logfiles etc where the same problem
>> popped up but nothing resembling a fix.
>>
>> Has anyone seen this or does anyone have a hint for us?
> There would be at least the following possible causes:
>
> 1. LFC client-server handshake timeout e.g. due to overload on client,
> server or network.
>
> 2. Network HW problem, whereby some packets occasionally get corrupted.
>
> 3. OS bug.
Hi Maarten,
Thanks for the tips.
I think the lfc server is ok, nobody else sees this from what I know.
It's intermittent and not restricted to certain nodes so I guess not OS.
A handshake problem or network I cannot currently rule out so we'll just
keep trying to reproduce it.
I'll see if we can double check the network as we're at it.
cheers
john
--
+------------------------------------------------------------+
|Dr. John Alan Kennedy Rechenzentrum Garching (RZG) |
|Mail: [log in to unmask] Boltzmannstrasse 2 |
|Phone: +49 89 3299 2694 85748 Garching |
|Fax: +49 89 3299 1301 |
+------------------------------------------------------------+
|