So UK pressure to improve CREAM for SGE has had an unfortunate side-effect.
Options could be to push for this version to be included in UMD but only ask SGE sites to upgrade. Or for SGE sites to take the release from EMI repository to get a working version. I think the latter is safer as it would prevent non-SGE sites being accidently broken when people don't read the documentation (that could never happen, could it?).
Are SGE sites happy with the latest EMI release? Anyone not tried it?
John
-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of Tiziana Ferrari
Sent: 16 December 2011 14:24
To: NGI Operations Centre managers
Subject: [Noc-managers] CREAM and UMD 1.4.0: your feedback needed
Dear all
as discussed at the last operations meeting the latest version of CREAM
was planned for release in UMD 1.4.0 (due next Monday 19th of December),
as CREAM is particularly needed for SGE sites. Unfortunately a serious
bug was detected today affecting CREAM (bupdater for PBS and LSF).
The bug is documented at:
http://wiki.italiangrid.it/twiki/bin/view/CREAM/KnownIssues#Memory_leak_in_bupdater
"Version 1.16.3 of BLAH is affected by a quite critical memory leak in
the bupdater component for LSF and PBS. Because of that the usage of
memory of the bupdater process will keep increasing till when it
crashes/it is killed by OOM. It is then automatically restarted by blah.
The problem concerns PBS and LSF, but not SGE.
For LSF and PBS, the workaround is to configure the blparser using the
old method:
http://wiki.italiangrid.it/twiki/bin/view/CREAM/SystemAdministratorGuideForEMI1#1_2_4_Choose_the_BLAH_BLparser_d
Relevant bug: https://savannah.cern.ch/bugs/index.php?89859"
The workaround for PBS/LSF consists in the following steps:
- to set explictly a yaim variable
- to configure the cream-ce using yaim
- to configure the blparser using yaim
- to restart tomcat
We need to decide by today if we want to include CREAM with the
documented workaround in UMD 1.4.0.
My proposal is to *not* include it, even if for SGE verification and
staged rollout were ok, and to release it with a fixed EMI.blah in UMD
1.5.0 (30 Jan 2012). The motivation of this is to not risk a disruption
of medium/small sites with PBS, for which an upgraded CREAM will require
some careful workaround.
Let me know if you have problems with this proposal.
Thank you in advance
Tiziana
--
Tiziana Ferrari
EGI.eu Operations
Science Park 140, 1098 XG Amsterdam, NL
m: 0031 (0)6 3037 2691
_______________________________________________
Noc-managers mailing list
[log in to unmask]
https://mailman.egi.eu/mailman/listinfo/noc-managers
|