Hello,
From my experience of update 23 and 24, I do not recommend to anyone
to upgrade their site until all the current problems have been
resolved. In particular do not upgrade your DPM!
However, do install the latest lcg-voms.cern.ch!
If you do wish do upgrade some components, below is my
experience of what works and does not and some of the changes to be
aware of.
The new VO style support in yaim seems to work fine.
Yaim provides support for software manager and production pool
accounts (e.g. atlassgm01, atlassgm02,...) Unfortunately, this new
feature does not work or is fiddly, so you should keep your old
users.conf and group.conf files.
Depending from which version of gLite you upgrade from, you'll need to
apply some of the following changes to your site-info.def:
Add and set the SITE_SUPPORT_EMAIL variable.
There is a new way to define queues, so you will need to add something
along the lines:
ALICE_GROUP_ENABLE="alice"
ATLAS_GROUP_ENABLE="atlas"
...
SHORT_GROUP_ENABLE="atlas alice babar biomed cms dteam hone ilc lhcb
ngs ops zeus calice"
if you got a queue assigned to each VO and short queue for all
VOS. The variables VO_${VO}_QUEUES are not used any more - keeping
them will not crash yaim (as expected).
There is a new inoffensive YAIM_LOGGING_LEVEL variable. I found out
that setting it to NONE was equivalent to setting it to WARNING.
If you read the release notes carefully, you'll find out that there is
a new way to run yaim. One can now configure a CE as follows:
/opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n CE
Once the configuration is over, yaim will wait for a CTRL-C, so
you'd better not use that on workers. Shortly after running yaim on my
CE as above, my gatekeeper ended in a locked state:
$ service globus-gatekeeper status
edg-gatekeeper dead but subsys locked
$
The gatekeeper ended up in the same state after I had rerun yaim
again. I've got no idea about what caused this and I had never
encountered this problem in the past, so something to watch out
for and to investigate.
There is an ugly bug(or rather design flaw) in config_sw_dir. When yaim
runs on workers, it will try to do a recursive chmod on your software
area. In my case, this resulted in permission denied and error messages
for every single file in my +50GB /software directory :( Now, imagine if
big sites start this automatically on all workers! For more on this,
see the "NOTICE - VOBOX vs. VO software area ownership" in ROLLOUT. I
really think, an EGEE broadcast should have been issued and not simply an
email to ROLLOUT.
You can upgrade your VOBOX provided you've got the 3.0.1-15 version
(from release 24 - release 23 crashed) of yaim and you take the
precautions mentioned in Marteen's email.
Once you do upgrade DPM, do not forget to change the port in
BDII_SE_URL. THe new DPM uses a BDII rather than a GRIS.
I'm aware there is a middleware certification process, some testing at
CERN and limited testing on the PPS (unfortunately), but when I
see the type of horrors that reaches production sites, I'm lead to say
that the certification and testing is totally ineffective or highly
superficial or even nonexistent. I'm a culprit for the lack of testing
on the PPS, but we have very little time to perform any test at all! There
are plans to do more testing in the PPS, but I still wish we would
have more than one day for this, and in particular when new major
functionalities are introduced, and that more time would pass before
releases go from the PPS to production.
Yves
|