Might have had something to do with auth LDAP server failures, leading to
auth problems on the RB, leading to auth problems when trying to get the
proxy ... ?
JT
Maarten Litmaath said:
> Jose del Peso wrote:
>
>> Dear all,
>>
>> I have tried to submit a job this morning and it failed. The following
>> output
>> is obtained:
>>
>> _________________________________________________________
>>
>> [delpeso@grid010 atlas-simula]$ edg-job-submit --vo atlas testJob_SW.jdl
>>
>> Selected Virtual Organisation name (from --vo option): atlas
>> Connecting to host lxn1188.cern.ch, port 7772
>> Logging to host lxn1188.cern.ch, port 9002
>> **** Error: API_NATIVE_ERROR ****
>> Error while calling the "edg_wll_RegisterJobSync" native api
>> Unable to Register the Job:
>> https://lxn1188.cern.ch:9000/XN2EtYRRsFLZ-Tsan94D7w
>> to the LB logger at: lxn1188.cern.ch:9002
>> SSL Error (sslv3 alert handshake failure)
>
> Indeed, it was an error that is new to us: the edg-wl-logd somehow got
> an invalid proxy at the time its proxy file was renewed at 08:26.
> Exactly 6 hours later the problem went away as the proxy was again
> renewed.
>
> Unfortunately we did not catch the very proxy used during those 6 hours,
> but we did change two things:
>
> - the renewal job now runs every 5 minutes;
>
> - it copies each proxy to an area for later analysis.
>
> The RB uses a disk server to hold most of the WP1 state information,
> which means the proxy happens to sit on an NFS; we have had problems
> with a few other state files when they were on an NFS, but not with
> any of the service proxies. We looked into the code and did not yet
> see how it might fail in this respect.
>
|