-------- Original Message --------
Great! We got rid of the junk (look at the last two jobs in
here now, click the green squares to see stdout/err)
http://na62.gla.ac.uk/index.php?task=jobs&site=LIV&nev=300
Thanks a lot! Best regards,
Dan -- Dan PROTOPOPESCU, University of Glasgow,
-------- End of Original Message --------
Hi All,
Re: msg above.
We had a request of fix a stream of error messages that were coming
out of the Liverpool cluster into the output of NA62 jobs.
In short, the errors occurred because some of the *.0 files in
/etc/grid-security/certificates had no related *.r0 file. There
is a long explanation of that given below. We "fixed it" here by
applying a script that is also given below. There are other
fixes (also below) which may be better (mine is hacky).
Anyway, that's all I'm saying, else I'll make a lot of work
for myself. Obviously I'll assist anyone who wants to do
something similar. I'm not sure how to progress this, as it
is "fixed" in newer version (e.g. SL6) of the WN software, so I
am told, so it's not worth making a big deal out of it, maybe.
As well as the script, and the explanation, I'm also attaching
another fix, given by David Groep, that uses a different
approach. I hope this helps someone.
Cheers,
Steve
--- THE HACKY SCRIPT ---
[root@imageserver trunk]# cat ./modules/lcg/files/r0fix.sh
#!/bin/bash
if [ ! -d /etc/grid-security/certificates ]; then
echo No /etc/grid-security/certificates dir
exit 2
fi
# Go over each real CA pem
cd /etc/grid-security/certificates
for pem in *pem; do
IFS='
'
# Find the two .0 links to this pem file, and get the hash stems
count=0; for line in `ls -lrt *.0 | grep $pem`; do
count=`expr $count + 1`
stem=`echo $line | sed -e "s/ ->.*//" | sed -e "s/.* //" | sed -e
"s/\.0//"`
stem[$count]=$stem
done
# The logic is this: if the r0 for one stem does not exist, yet the r0
# for the other stem does, then make a link from the missing r0 file
# to the present one.
if [ ! -f ${stem[1]}.r0 ]; then
if [ -f ${stem[2]}.r0 ]; then
ln -s ${stem[2]}.r0 ${stem[1]}.r0
fi
fi
if [ ! -f ${stem[2]}.r0 ]; then
if [ -f ${stem[1]}.r0 ]; then
ln -s ${stem[1]}.r0 ${stem[2]}.r0
fi
fi
done
--- THE LONG EXPLANATION ---
The framework on a node requires each CA certificate pem
file to be accompanied by a link, e.g 1a2bc3d4.0 The name of the
link is computed by hashing the content of the
CA certificate pem file. It is used to allow
fast lookups or something like that. Each CA package
has the hash link built in, so the link is put on
when the package is installed.
Anyway, periodically, revocation files are downloaded
for each CA. By convention they are given the same stem as
the hash link, but with the extension .r0, e.g. 1a2bc3d4.r0
They match, so it is easy to load the .0 file and apply its
revocations from the matching .r0 file. That's all fine and dandy.
BTW: That tomcat lib you mentioned _requires_ both .0 and .r0 files
or it complains.
But at some point (between 0.9 and 1.0), openssl changed the
way it computes hashes (i.e. from MD5 to SHA1, different
hash algorithms). So any site may use a different set of
hash values to some other site, depending on the version of
openssl used locally. That situation sucked.
So the managers of the CA releases packaged BOTH links
(MD5 and SHA1) into the rpm, e.g. 1a2bc3d4.0 and (say)
d5c6b7a8.0. Now, whatever version of openssl you have, the
links needed are right there.
But here's the problem: a program called fetch-crl periodically
brings down the new revocation files. It uses the local version
of openssl to compute the name for the file, thus it
matches _one of_ the two hash links put in by the package,
e.g. either 1a2bc3d4.r0 or d5c6b7a8.r0. The other link
does not receive an associated revocation file, because openssl
does not have to logic to compute the name.
Hence, your "trustmanager Tomcat connector" comes across
a .0 link with no associated .r0 revocation list. And it throws
an exception etc.
There are many solutions. I am told that the problem will
go when SL6 is used. I'm CC'ing David Groep; he knows
about that. Here at Liverpool, for now, I'm simply "guessing"
the name of the missing link from the evidence lying around
in the /etc/grid-security/certificates directory. Once I guess
it, I just put a link in to the good .r0 file from the missing one.
It should work. The script that does the "guessing" is below.
--- ANOTHER FIX, MAYBE BETTER? ---
> From David Groep
Hi Stephen, Maarten,
Writing out both hashes is actually the *default behaviour* of fetch-crl3
*provided* the openssl version used by fetch-crl is 1.0.0 or above.
This is the case on RHEL6 and above, for instance.
On the command line, type:
/usr/bin/openssl version
and if it says "1.0.x" you should have been fine. If you see 1.0.0 and
still lack dual hash, read on.
If your system default openssl is too old, install another instance
and set the openssl command used by fetch-crl to point to this new 1.0.0
binary, for example in the config file /etc/fetch-crl.conf:
openssl=/usr/local/bin/openssl1
Also, make sure the default fetch-crl output formats have not been re-set
to something different than the default. Look out for config options
"formats"
(when set, this should include "openssl"), and "opensslmode" (if set, it
must
be set to "dual").
The options described above are available in all versions of fetch-crl3.
Cheers,
DavidG.
|