Hi Marcus,
> Is there anywhere a documentation how to change the
> current system to include the new LSST changes?
Sort of, but it's very dependent on each site. It's a moving target, so
I'm afraid I have to give you a massive answer and I hope it helps.
Please feel free to ask more about this sort of thing.
Judging by your email, you are new to GridPP and you working as an admin
at the Edinburgh grid site? I'll show some of what's available (at the
bottom), but first I'll briefly tell you why it's like this. And then we
can discuss how to move forward at your site.
The majority of sites run gLite/UMD [1] middle-ware. This has changed
its name several times, and is presently called UMD. It's the same
thing, but many generations on. The middle-ware receives job
instructions from some user, sets them running on a worker node and gets
the output back to the user. A key part of this is authentication and
authorization (security)[3], which looks at the id of the user, and
tells what he's allowed to do. It's implemented in many different ways,
and the UMD attempts to encompass all of them, using an integration tool
called YAIM[2], which is no longer maintained. In addition, new
technologies have emerged which are outside UMD which YAIM doesn't even
cover, and anyway everything is moving to a Cloud-based format where
Yaim is irrelevant.
So, security. All of the above is supported by Grid Security
Infrastructure (GSI), which looks at the id of the user, and tells what
he's allowed to do. GSI rests on the notions of encryption, digital
signatures, certificates and so on[4]. A user obtains a long-term grid
certificate and uses it to join a virtual organization (VO) [4][5]. His
certificate is used as a root from which to create the short term
proxies that accompany the job as evidence of ID. These proxies are
verified by the middle-ware, as explained in the user guide [1]. The
component for verifying proxies is a VOMS Server[6]. VOs often have a
set of these. At GridPP, we maintain a list of Approved VOs[4], which
sites can support. As well as the name of the VO, we maintain a copy of
its VOMS Server records which identify the VOMS Servers. These are in
YAIM format so that they can be used by a site to find, connect to and
use a VOMS Server. These records are periodically extracted from the
Operations Portal[7] and formatted by a tool [8]. When you get notified
by a message in TB_SUPPORT, it means that GridPP has picked up a change
to the Operations Portal VOMS records for a VO. If you support that VO,
you must update your site so it can use the updated VOMS Server.
Now we need to talk about the configuration of a cluster. But there is a
large number of technologies out there, and each combination can be
configured in many different ways. This is a big problem that YAIM
partially solves. Since there are so many different ways to set things
up, I'll talk about one of the clusters we have here at Liverpool in the
hope that is comparable to the setup at your site. The cluster uses
CREAM (CE) and TORQUE (Batch Server) and is thus representative of a
typical grid site. You'll have to carefully look for the differences
yourself and get back to me about them. Each site will likely have
variations on this type of procedure for changing a supported VO. Adding
a new VO is _much_ harder than changing an existing one, but we can talk
about that later.
METHOD 1 - A FULL YAIM RUN
At Liverpool, we use Puppet and YAIM to configure our CREAM/Torque
cluster. Using Ewan's phrase, our cluster is “YAIM hardened”. It means
we can run YAIM in full on any system in cluster whenever we want, in
particular whenever changes to the VOMS Server records occurs. So the
easiest way to update our site to VOMS Server records is as follows.
On our puppet server.
# cd /root/svn/puppet/trunk
# cd ./modules/emi-common/files/vo.d
Edit the VO's config file in the vo.d directory, then roll out the
changes in puppet to all our service and worker nodes.
Once all the changes are finally over, we run these commands on the
various node types.
WN - /opt/glite/yaim/bin/yaim -c -s /root/glitecfg/site-info.def -n WN
-n TORQUE_client -n GLEXEC_wn
TORQUE - /opt/glite/yaim/bin/yaim -c -s /root/glitecfg/site-info.def -n
TORQUE_server -n TORQUE_utils
CE - /opt/glite/yaim/bin/yaim -c -s /root/glitecfg/site-info.def -n
creamCE -n TORQUE_utils
ARGUS - /opt/glite/yaim/bin/yaim -c -s /root/glitecfg/site-info.def -n
ARGUS_server
DPMHEAD - /opt/glite/yaim/bin/yaim -c -s /root/glitecfg/site-info.def -n
emi_dpm_mysql
DPMDISK - /opt/glite/yaim/bin/yaim -c -s /root/glitecfg/site-info.def -n
emi_dpm_disk
APEL N/A
BDII N/A
Once that is done, the site will be up to date, but this is using a
sledge hammer to crack a nut. And note that your setup may vary a lot
from this. Check in /opt/glite/yaim/log/yaimlog to see the actual
commands that were run before at the site (grep for “command:”). Also
ask the previous admin for his procedure at your site, as it all depends
on the original setup. In particular, don't do anything until you
understand the repercussions.
METHOD 2 - BY HAND
You could ignore YAIM and do it by hand, by knowing how YAIM distributes
the information within each server type. For this change, this would be
done as follows for each applicable server in your cluster (WNs, TORQUE,
CEs, ARGUS, DPMHEAD, DPMDISKs).
# for f in `find /etc/vomses -name "lsst*"`; do vi $f; done
# for f in `find /etc/grid-security/vomsdir/lsst/ -type f `; do vi $f ; done
Using (e.g.) vi, sync each file by hand with the new values provided in
the Approved VOs. Obviously, for a cluster with more than a few nodes,
this is a big deal.
METHOD 3 - A PARTIAL YAIM RUN
A full YAIM makes a lot of changes. It's possible and quite common to
alter a site's VOMS records by running YAIM a single function at a time.
This is a good option to use for a simple VOMS Record change involving
just VOMSES and VOMS_CA_DN, e.g. the LSST changes could be accommodated
on your servers with the following sequence.
For each applicable server in your cluster (WNs, TORQUE, CEs, ARGUS,
DPMHEAD, DPMDISKs), run these commands.
/opt/glite/yaim/bin/yaim -r -s /root/glitecfg/site-info.def -n BLAH -f
config_vomsdir -f config_vomses
This does the equivalent of the manual work above (aside: obviously,
there is no BLAH node type; you should really put the right node type in
the command but I have been too lazy.)
Cheers,
Steve
[1] User Guide https://edms.cern.ch/file/722398/1.4/gLite-3-UserGuide.pdf
[2] YAIM https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400
[3] https://en.wikipedia.org/wiki/Grid_Security_Infrastructure
[4] https://www.gridpp.ac.uk/wiki/GridPP_approved_VOs
[5] https://www.gridpp.ac.uk/wiki/Grid_user_crash_course
[6] http://italiangrid.github.io/voms/
[7] http://operations-portal.egi.eu/
[8] https://www.gridpp.ac.uk/wiki/VomsSnooper_Tools
|