Hi Steve,
Steve Fisher wrote:
> On Mon, May 02, 2005 at 05:28:46PM +0200, Jeff Templon wrote:
>> ... I love the information model, the python
>>interface, the sql queries, etc but it's time to stop producing new code
>>bases and concentrate on making the system work.
>
> This is not really a rollout issue, however I must correct the
> suggestion that the R-GMA version in LCG2_4_0 is a new code base - it
> is a bug fix and we are continuing to work on the other reported
> bugs. It looks different because of the wrappers - but these are
> rather small and simple.
I agree that from the /developers/ point of view these wrappers may be
simple, but when deploying it they are not so simple at all. It was probably
not just because of marketing reasons that the major version was bumped
to 4.0 :-)
A quick comparison does show a *lot* of differences:
* the 3.4.x release worked happily without any special jar files
in /var/tomcat4/common/endorsed/, but the version requires all kinds
of XML magic libraries there...
* the 3.4.x release has individually recognisable classes in WEB-INF, but
the new version comes (adn stays) with a single ".war" file unless you
do something about it to make it work...
* the one and only magic command that usually got R-GMA working again,
$EDGLOCATION/libexec/edg-rgma-restart-all, is just gone in 4.0 ...
* the "emailContact" string, that told you who the admin was for a service
(from /opt/edg/sbin/edg-rgma-publish-service), and which was finally
beginning to be set correctly following Emanouil Atanassov mail in March,
it still there but now just cannot be set anymore :-) The new
"servicetool" just does not support it ...
* we got a whole new set of dependencies on SWIG and xerxes-c, that needed
to be satisfied by a dedicated install (edg-essentials-cpp-1.1.1-1)...
I got hit by all of these problems when upgrading from LCG2.3.1 to 2.4.0,
and spent a lot of time hunting down these new issues on the wiki's and
roll-out lists (and even via tcpdump in case of the "common/endorsed/"
problem). And I started off with a working instance of R-GMA 3.4.x.
These differences may be minor for the developers, but they bother the
deployment (at least here) greatly, and effectively constitute what I would,
like Jeff, call a "new codebase". The wrappers are part of the code too.
I like to see R-GMA working, as I, like Jeff, like the SQL interface and the
model. But let's keep the big fixes simpler next time.
Cheers,
DavidG.
>
> Steve
>
>
>
>> J "time to catch a train otherwise i'd continue ranting" T
>>
>>David Groep wrote:
>>
>>>Hi all,
>>>
>>>Jiri Kosina wrote:
>>>
>>>>I have just heard from Marcin Radecki that R-GMA service is going to be
>>>>considered as critical test in SFT quite soon.
>>>>...
>>>>I personally think that making the service, which fails on 31 sites,
>>>>which are healthy when not counting R-GMA as not a very lucky step.
>>>
>>>I'd like to concur, since it has been virtually impossible to get a
>>>stable R-GMA service which remains working for more than one release.
>>>And indeed, while the changeover to the new-but-not-yet-the-latest R-GMA
>>>version 4.0 may have solved some of the old problems, it did introduce
>>>a lot of new failure modes...
>>>
>>>Can we wait in making R-GMA a critical component till the product has
>>>shown to be stable (only minor changes, and no radical new code bases
>>>please) for at least one release and deployment at a large scale?
>>>It would lighten the stress of several admins and give the R-GMA
>>>people time to stabilise the (deployment of) the current release.
>>>
>>>And please, no migrations to version 5+ in the mean time :-)
>>>
>>>
>>>
>>>>This, among other things, means (when looking at the latest report),
>>>>that 31 sites, if I count correctly, would be marked 'CT', even if
>>>>everything else, apart from R-GMA, was tested as OK.
>>>>
>>>>I have posted many times questions regarding non-working R-GMA setup
>>>>(I am using YAIM on RH 7.3 on both farms I manage, and I _never_ got
>>>>R-GMA working, neither did I receive any helpful advice) (for example
>>>>here:
>>>>http://www.listserv.rl.ac.uk/cgi-bin/webadmin?A2=ind0502&L=LCG-ROLLOUT&P=R14255&I=-3
>>>>- with no reply, and more can be found).
>>>>
>>>>I personally think that making the service, which fails on 31 sites,
>>>>which are healthy when not counting R-GMA as not a very lucky step.
>>>>
>>>>Thanks for any comment,
>>>>
>>>
>>>
--
David Groep
** National Institute for Nuclear and High Energy Physics, PDP/Grid group **
** Room: H1.56 Phone: +31 20 5922179, PObox 41882, NL-1009DB Amsterdam NL **
|