On Wed, Jun 22, 2005 at 11:38:02AM +0100 or thereabouts, Ian Stokes-Rees wrote:
> Peter Gronbech wrote:
> >We use a combination of yum (to load updated rpms), yumit to advise us
> >on the patch status, and copy of an ssh key in
> [snip]
> >I can then spot that systems need patching with yumit, check what is
> >required (ie which oatches are missing, again with yumit) and the type
> >ont2nodes yum -y update
> >I'm sure there are many other solutions to this problem but this works
> >for me.
>
> Wow, that just made me realise "yet another complication of grid
> computing". Pete's system sounds excellent. Very simple to manage. I
> would imagine there could be big implications for running jobs, though,
> if the software they are using is changing under their feet.
>
> Are there risks that this might throw off software execution? I
> certainly imagine it might. We (LHCb) do a bunch of software version
> checks at the start of execution (and not just for LHCb/Physics
> software). Weird failures are one thing, but failures are probably
> better than producing bad results *without* any errors or output
> inconsistencies due to changes in software between two steps of a job.
You won't get bad results for a running job. The executable will
be accessing the inode with the file in memeroy . Deleting the file
does not remove it from memory. I think that is correct but maybe
someone will correct me now.
Of course if you run two serial exes in your job you could of course
get different libs for the two runs. You have to rely on some extent
that the changes put in by redhat are backwards compatable. That is
what you probably have to live with.
I'm told that HP-UX used to mv all the old files to a new name when patches
were applied so for all the open files the files were preserved.
>
> What happens if a library is updated? I don't know enough about how
> link-resolution and inodes work to understand whether dynamic libraries
> are all "referenced" at the start of execution, so the OS holds inode
> references to the old library, even if the physical file changes later
> (but still during execution of some process which is referencing it).
Exactly it is inode that counts when once the file is open.
>
> I suppose ideally it would be good to "inject" update jobs into the
> queue, but then the three problems exist:
>
> 1. This means syncing on both (or all 4) processors, which will almost
> certainly will mean significant wasted CPU (50% of average job length on
> duals, and more on quads, I guess).
>
> 2. Not very nice to have different nodes running different software, and
> perhaps even impossible if it is an update which relates to the
> grid/cluster infrastructure. This would imply certain types of update
> probably require a full site "sync".
>
> 3. How to make sure those update/admin jobs get run exactly once on
> every node. Oh, I suppose cluster software must have a way of doing
> this, as it would be a common problem.
>
> What are people's thoughts on that?
The bottom line is you have to trust the OS provider not to mess things
up for you.
But we do normally hold back on glibc if possible until we need a kernel
update which is usually more frequent.
Steve
>
> Cheers,
>
> Ian
>
>
>
> --
> Ian Stokes-Rees [log in to unmask]
> Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
> begin:vcard
> fn:Ian Stokes-Rees
> n:Stokes-Rees;Ian
> org:University of Oxford;Particle Physics
> adr:;;Keble Road;Oxford;;OX1 3RH;UK
> email;internet:[log in to unmask]
> tel;work:+44 1865 283 111
> tel;fax:+44 1865 273 418
> tel;cell:+44 7989 947 217
> x-mozilla-html:FALSE
> url:http://grid.physics.ox.ac.uk/~stokes/
> version:2.1
> end:vcard
>
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
|