Dear all,
brief picture:
* SAM depends on lcg-CE and MON services, so let's call these "grid-required"
* lcg-CE and glite-MON targets are officially supported only under gLite v3.1
* gLite v3.1 requires SL*4 OS, which has *less-than-desired patch-level*
I have used these two urls as my reference for the previous statements:
http://glite.web.cern.ch/glite/packages/R3.1/
http://glite.web.cern.ch/glite/packages/R3.2/
But the following, seems to suggest that some related work is ongoing:
"Some libraries have been added to LD_LIBRARY_PATH and ld.so.conf to be able
to configure lcg CE and cream CE in SL5." (found in lcg-CE v3.1.39-0 notes)
http://glite.web.cern.ch/glite/packages/R3.1/deployment/lcg-CE/3.1.39-0/lcg-CE-3.1.39-0-update.html
So,
are there going to be glite v3.2 updates for lcg-CE and MON packages?
OR,
could we have a compatibility statement for those i386 binaries under 64bit?
I take the opportunity to ask the following two CEs (w. el5/64bit lcg-CEs?),
their real-world experience: pg.ihepa.ufl.edu & ceprod00.hep.ph.ic.ac.uk
At least the 2nd one, really looks quite like lcg-CE on el5/64bits.
(if someone could pass on the message, please do so)
ps.
Check below for some proof of the previous claims for unpatched OS situation.
(debugging these problems takes *hours* on our behalf, I hope we can avoid it)
ps2.
I'd like to hear by other people what is the "supported" configuration for
running together lcg-CE and CREAM CE (including lrms, what and how exactly),
AFAI can tell, many sites have had troubles in getting this forward so,
if someone can provide a URL with real-world experience it would be so nice!
tia,
Fotis
===
(this part is only fyi and is only addressing fellow sysadmins)
SL*4 is less than supported, with some OS packages being old versions;
this seems to apply for both i386 and x86_64 versions.
eg. look at: http://linuxsoft.cern.ch/cern/slc4X/i386/yum/updates
If the fault lies in the upstream RHEL4, I think it is not an excuse!
As an example, the xargs command as it is supplied and running right *now*
(as of March 2010) in nearly any lcg-CE, is a well-known problematic version,
and the current rpm findutils-4.1.20-7.el4.3 didn't get any yum update,
for a couple of years already.
I'll spare you the details, but just provide an one-liner that annoyed us:
(in the 1st case it will spawn only 2 out of 5 children processes)
[root@ce01 ~]# echo wn08 wn10 wn16 wn21 wn45|xargs -n1|xargs -n1 --replace
strace ssh {} uptime 2>&1 |grep '"wn'
execve("/usr/bin/ssh", ["ssh", "wn08", "uptime"], [/* 69 vars */]) = 0
read(4, "wn16\nwn21\nwn45\n", 16384) = 15
execve("/usr/bin/ssh", ["ssh", "wn10", "uptime"], [/* 69 vars */]) = 0
[root@ce01 ~]#
[root@ce01 ~]# xargs --version
GNU xargs version 4.1.20
[root@ce01 ~]#
[root@ce01 ~]#
# Let's use now a locally recompiled from latest sources binary
[root@ce01 ~]# echo wn08 wn10 wn16 wn21 wn45|xargs -n1|
/root/fotis/xargs/findutils-4.4.2/xargs/xargs -n1 --replace strace ssh {}
uptime 2>&1 |grep '"wn'
execve("/usr/bin/ssh", ["ssh", "wn08", "uptime"], [/* 69 vars */]) = 0
execve("/usr/bin/ssh", ["ssh", "wn10", "uptime"], [/* 69 vars */]) = 0
execve("/usr/bin/ssh", ["ssh", "wn16", "uptime"], [/* 69 vars */]) = 0
execve("/usr/bin/ssh", ["ssh", "wn21", "uptime"], [/* 69 vars */]) = 0
execve("/usr/bin/ssh", ["ssh", "wn45", "uptime"], [/* 69 vars */]) = 0
[root@ce01 ~]#
[root@ce01 ~]# /root/fotis/xargs/findutils-4.4.2/xargs/xargs --version
xargs (GNU findutils) 4.4.2
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Built using GNU gnulib version e5573b1bad88bfabcda181b9e0125fb0c52b7d3b
[root@ce01 ~]#
(I can also verify that the bug does not seem to manifest itself under sl*5)
btw.
some statistics about findutils rpms on CEs available for dteam VO:
[gef@ui01 ~]$ sort /tmp/xargs_rpm_versions|uniq -c
5 findutils-4.1.20-7.el4.1
1 findutils-4.1.20-7.el4.1.x86_64
283 findutils-4.1.20-7.el4.3
11 findutils-4.1.20-7.el4.3.i386
51 findutils-4.1.20-7.el4.3.x86_64
2 findutils-4.1.7-9
6 findutils-4.1.7-9.1.SL
2 findutils-4.2.27-6.el5
[gef@ui01 ~]$
btw-and-fyi: The offending command to collect the above info had been:
time cat CEs |xargs -P33 -n1 --replace globus-job-run {} -m 5 /bin/rpm -qf
/usr/bin/xargs|tee /tmp/xargs_rpm_versions
The 2 sites with a patched findutils-4.2.27-6.el5, run on a rhel5 variant!
|