Dear colleagues,
I received a report that LHCb jobs are failing at our site, since yesterday.
After some digging in the logs, I found some indications that the CRL for our
CA, Datagrid-fr, were not downloaded at some of my WNs:
edg-fetch-crl: [2005/02/02-14:54:15] could not download a valid file from 'http://igc.services.cnrs.fr/cgi-bin/loadcrl?CA=CNRS-Projets&format=PEM'
edg-fetch-crl: [2005/02/02-15:03:57] could not download a valid file from 'http://igc.services.cnrs.fr/cgi-bin/loadcrl?CA=Datagrid-fr&format=PEM'
edg-fetch-crl: [2005/02/02-15:10:58] could not download a valid file from 'http://igc.services.cnrs.fr/cgi-bin/loadcrl?CA=CNRS&format=PEM'
They were downloaded later on, manually too, to be on the safe side.
However, I remember the previous experience, when the CRL was issued with a bit of delay, and then
due to high web server load and crontab synchronisation
was not downloaded at all sites and created an authentication problem for many users and sites, using
Datagrid-fr certificates.
Then I tried the following command:
globus-job-run myCEhost/jobmanager-torque
/opt/globus/bin/globus-url-copy \
file:///etc/group gsiftp://someoneElsesRBhost/tmp/junk
with two values for the RBhost:
lxn1188.cern.ch and lcgrb02.ifae.es.
With lxn1188 the command succeeded as I expected, with no output, and
with lcgrb02.ifae.es I got:
error: globus_l_ftp_control_send_cmd_cb: gss_init_sec_context failed
GSS failure:
GSS Major Status: Authentication Failed
GSS Minor Status Error Chain:
init_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problems
globus_i_gsi_gss_utils.c:881: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials
globus_i_gsi_gss_utils.c:854: globus_i_gsi_gss_handshake: SSLv3 handshake problems: Couldn't do ssl handshake
OpenSSL Error: s3_clnt.c:840: in library: SSL routines, function SSL3_GET_SERVER_CERTIFICATE: certificate verify failed
globus_gsi_callback.c:351: globus_i_gsi_callback_handshake_callback: Could not verify credential
globus_gsi_callback.c:477: globus_i_gsi_callback_cred_verify: Could not verify credential
globus_gsi_callback.c:769: globus_i_gsi_callback_check_revoked: Invalid CRL: The available CRL has expired
Which means to me that the CRL for Datagrid-fr, available at lcgrb02.ifae.es, has expired, and
the new CRL has not been downloaded yet.
This creates serious authentication issue, which can not be resolved by the affected sites or users.
Emanouil Atanassov
[log in to unmask]
|