After some report from Atlas testers, I did a systematic scan of some SE
functionalities at all LCG sites.
The test went like this: I copied and registered a file to the CERN SE
and then asked the replica manager to replicate it to each existing SE.
I am in the dteam VO, so there should be no problem with authorization.
The following sites correctly completed the operation:
KFKI-Budapest grid100.kfki.hu
CERN adc0021.cern.ch
Taipei lcg00103.grid.sinica.edu.tw
Tokyo dgse0.icepp.s.u-tokyo.ac.jp
BNL atlasgrid03.usatlas.bnl.gov
RAL lcgse01.gridpp.rl.ac.uk
CIEMAT-Madrid lcg03.ciemat.es
INFN-CNAF wn-02-30-a.cr.cnaf.infn.it
IFIC-Valencia loki03.ific.uv.es
IFAE-Barcelona grid-s1.ifae.es
UAM-Madrid grid004.ft.uam.es
USC-Santiago lcg-se.usc.cesga.es
UB-Barcelona lcg-se.ecm.ub.es
With the other sites I had a few different errors, which are reported below.
==============================================================
hik-lcg-se.fzk.de
GridFTP: mkdir operation failed. the server sent an error response: 550
550 /flatfiles/SE00/dteam/generated/2003/11: Permission denied.
golias26.farm.particle.cz
GridFTP: mkdir operation failed. the server sent an error response: 550
550 /flatfiles/SE00/dteam/generated: Permission denied.
These two sites (FZK-Karlsruhe and Prague) have probably set the wrong
access mode to the /flatfiles/SE00 directories on the SE. The correct
settings should be:
drwxrwxr-x 2 root alice 4096 Oct 17 16:43 alice
drwxrwxr-x 3 root atlas 4096 Nov 10 23:02 atlas
drwxrwxr-x 2 root cms 4096 Oct 17 16:43 cms
drwxr-xr-x 2 root root 4096 Oct 17 16:43 data
drwxrwxr-x 2 root dteam 4096 Nov 10 19:19 dteam
drwxrwxr-x 3 root lhcb 4096 Nov 10 21:56 lhcb
The message from Karlsruhe is a bit puzzling, though: the
"generated/2003" dir appear to be there (or to have been created) but
then the system is not able to create the subdir "11". Peer, can you
please give a detailed look to the setting and see if you can understand
what is the real status of the filesystem?
==============================================================
lhc03.sinp.msu.ru
GridFTP: existDir operation failed. the server sent an error response:
530 530 No local mapping for Globus ID
This looks like that the SE in Moscow has the correct
/etc/grid-security/gridmapdir directory but the corresponding virtual
accounts have not been created. Lev, can you please give a look in this
direction and let me know what you see?
==============================================================
zeus03.cyf-kr.edu.pl
GridFTP: existDir operation failed. the server sent an error response:
501 501-FTPD GSSAPI error: GSS Major Status: General failure
501-FTPD GSSAPI error: GSS Minor Status Error Chain:
501-FTPD GSSAPI error:
501-FTPD GSSAPI error: acquire_cred.c:125: gss_acquire_cred: Error with
GSI credential
501-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:1298:
globus_i_gsi_gss_cred_read: Error with gss credential handle
501-FTPD GSSAPI error: globus_gsi_credential.c:552:
globus_gsi_cred_read: Error with credential: The host credential:
/etc/grid-security/hostcert.pem
501-FTPD GSSAPI error: with subject:
/C=PL/O=GRID/O=Cyfronet/CN=zeus03.cyf-kr.edu.pl
501-FTPD GSSAPI error: has expired 70357 minutes ago.
501 FTPD GSSAPI error: acquiring credentials
All node certificates at the Cyfronet-Krakow site must have expired as I
cannot even globus-job-run a command on the CE. I think this is related
to the fact that this site is still at the (now very old) LCG1-1_0_1
release of the LCG software. Could the site managers there update the
site and the node certificates, please? If this problem persists and I
do not get any feedback from the site, I'll have to remove it from the
information system so that it will not cause problems to LCG users.
==============================================================
hotdog48.fnal.gov
GridFTP: existDir operation failed. the server sent an error response:
501 501-FTPD GSSAPI error: GSS Major Status: General failure
501-FTPD GSSAPI error: GSS Minor Status Error Chain:
501-FTPD GSSAPI error:
501-FTPD GSSAPI error: acquire_cred.c:125: gss_acquire_cred: Error with
GSI credential
501-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:1311:
globus_i_gsi_gss_cred_read: Error with gss credential handle
501-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:1520:
globus_i_gsi_gss_create_cred: Error with gss credential handle
501-FTPD GSSAPI error: globus_i_gsi_gss_utils.c:2177:
globus_i_gsi_gssapi_init_ssl_context: Error with openssl: Couldn't set
the private key to be used for the SSL context
501-FTPD GSSAPI error: OpenSSL Error: x509_cmp.c:383: in library: x509
certificate routines, function X509_check_private_key: key values mismatch
501 FTPD GSSAPI error: acquiring credentials
It looks like that at FNAL the private key and the host certificate on
the SE do not match. I get the same message if I do a globus-url-copy to
that node. On the other hand, if I send a command to the CE with
globus-job-run, it works fine, so this must be a SE-specific problem.
==============================================================
Could the managers of sites with problems give a look? Please do not
esitate to contact me if you have problems in fixing your SE.
Cheers
Emanuele
--
/------------------- Emanuele Leonardi -------------------\
| eMail: [log in to unmask] - Tel.: +41-22-7674066 |
| IT division - Bat.31 2-012 - CERN - CH-1211 Geneva 23 |
\---------------------------------------------------------/
|