At RHUL we have a system provided by Compusys. The master node has the
master copy of the disk image for the worker nodes. Updates are first
applied to this, e.g. by chrooting to it. Part of the normal boot sequence
for worker nodes is to rsync against this image, so rebooting the worker
nodes brings them up to date. Is this similar to what you have and
dislike Paul?
I find that this works well enough, although it is sometimes inconvenient
to reboot the worker nodes. For example, if a lot of jobs are running we
have to schedule downtime and drain the queues in order to reboot the
worker nodes, which can take the site down for a day or two. I think this
will look bad in the service stats if done to often. For minor changes
(few files), one can use commands (from Compusys but probably everyone has
something similar, based on rsh/rcp and a list of nodes) to copy these
files to all the worker nodes. But it would be nice to have an automated
procedure to take nodes out a few at a time as jobs finish to reboot them,
a sort of rolling reboot, so the site stays up. It would need to keep
track of which node had which version of what on it. I have heard that
Quattor os meant to do this sort of thing.
Cheers,
Simon
On Tue, 21 Jun 2005, Paul Kyberd wrote:
> I was looking for advice on what system other sites use to
> distribute updates to the machines at their site.
>
> I am not very happy with the operation of the system based on
> ghosting a machine which has been implemented at Brunel and was interested
> in finding a better solution.
>
> Paul
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> + Paul Kyberd Brunel University +
> + E-mail: [log in to unmask] Department of Electronic and +
> + Phone: +44-(0)1895-266801 Computer Engineering +
> + Fax: Uxbridge, Middlesex UB8 3PH +
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
|