> We use kickstart and puppet for these things, and
> it would take N days to port it from that to some other,
> manual "build system".
Uhm, I found something like that here, and it was
semi-reasonably setup and I have slightly updated it, but it has
a number of builtin assumptions about versions of middleware and
OS, so given that I am under pressure to just get it done, I
have been asked to take some shortcuts.
Since I already have an APEL SL5 image I cam going to clone that
and change host-specific bits and then install a different set
of middleware packages.
> how long it will take to
> build the hardware/OS/network layers for the other systems. And
> it will also give you a clue about the process for the other
> node types. So, you could use the BDII to (partly) calibrate the
> process for the other systems.
Ah yes, that's a good point and I am a bit ahead here as I have
already done that last year with APEL SL5.
I guess I should probably do first the BDII, then perhaps
TORQUE_server (IIRC I can take down the gLite 3.1 one for a
while and jobs will just continue running), and probably I can
just declared a short downtime for the disk nodes.
What worry me are CREAM and less so the DPM, and bringing over
the state of the latter, and reproducing *exactly* the
gridmapdirs and userdirs etc. and all those little details.
> You can have two site BDII's at
> once, no problem, and you can compare their contents to make
> sure they are functionally identical.
That's a good point (that you can have two site BDIIs is useful
information), I guess that then I might install a second site
BDII in the APEL VM.
> With respect to build times for CREAM, and TORQUE_server; these
> are hard to estimate. It was necessary here at Liverpool to
> build a CREAM, and TORQUE_server and an lcg-CE in a period of
> downtime a year or so ago. It was a week of early starts and
> late nights before all that was working properly together.
Uh I was afraid of that :-).
> The "Big Bang" approach
> (that you are talking about) was fearsome to me, and I'd always
> try to take an incremental approach if I could.
Same here, if I could :-).
> The Big Bang
> approach is basically the same as a total disaster recovery,
> [ ... ]
That's actually the silver lining, it could be good practice.
> In short - incremental is the best way, but if you must go "Big
> Bang", then have lots of tea/coffee on tap - you'll need it.
The Institute here already has excellent (and much needed and
used :->) caffeination facilities :-).
Also, I have already had some useful hints from the helpful
ScotGrid staff and blog (and the other regional blogs).
|