I've posted this to WP3-Testbed, but I think I'm having trouble with pine
and email to JISC, so I thought I'd cross post it here. If anyone can
point me in the right direction regarding the problems outlined below
(some weird LCFGng problems on reboot), I'd really appreciate it.
Cheers,
Ian
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
---------- Forwarded message ----------
Date: Wed, 30 Jul 2003 17:35:29 +0100 (BST)
From: [log in to unmask]
To: WP3 testbed <[log in to unmask]>
Subject: Re: New tag: t20030725_1800
Interestingly, this is what happened with our other nodes when I rebooted
them:
SE -- rebooted without problem.
WN01 -- Reboots, says a bit about "Grub loading" and "Press any key to
continue" (four times) and then just sits there with a flashing _ in the
top left corner
WN02 -- *almost* identical to the CE (see below) except kernel line
includes console=tty0 console=ttyS0,9600
GEN01 -- Identical to WN01.
I have now tried re-starting our CE from scratch using an LCFGng boot
dsik. This does not work, and reports the following errors:
LCFG object updaterpms: Can't see the RPM dir
/export/local/linux/7.3/RPMS/WP8/Alice
LCFG object updaterpms: Can't see the RPM dir
/export/local/linux/7.3/RPMS/WP8/dzero
-> both these dirs have been removed (and are not included in site-cfg.h)
[INFO] updaterpms: No CD helper available
modprobe: modprobe: Can't open dependencies file
/lib/modules/2.4.18-18.7.x.cern/modules.dep (No such file or directory)
/etc/obj/fstab: /root/etc/fstab.new: No such file or directory
LCFG object fstab: Missing fstab template for hda
(/var/obj/conf/fstab/fstab/hda)
LCFG object install: install method failed
It then bombs out to a bash shell.
SE: It seemed to good to be true that it worked OK the first time on
rebooting, so I tried rebooting again. This time it did not work, but
rebooted itself and on the third time coming up it now seems OK (i.e. the
login window comes up). There are still problems reported during boot up:
useradd: unknown group nagios
LCFG auth: Failed to add user nagios.
LCFG object infoproviders: resource missing (SE_MOUNT) [ FAILED ]
Stopping Globus MDS/etc/rc.d/init.d/globus-mds: kill: (9274) - No such
process [ FAILED ]
Stopping globus-gridftp: [FAILED]
LCFG globuscfg: init.d globus-gridftp stop failed: 256 [ WARN ]
ln: '/opt/globus/sbin/globus-mds': File exists
configure: error: Cannot locate qstat
Error locating pbs commands, aborting!
(Then these last two lines are repeated for condor_q/condor, lsload/LSF,
mpirun, qdel, qsubmit, condor_submit, bsub)
It doesn't sound to me like everything is installing properly.
On Wed, 30 Jul 2003, Ian Stokes-Rees wrote:
> > Now to try and reboot the nodes and see if they come up with the new
> > software!
>
> Life is never easy, is it. I rebooted our CE which has been happily
> operating for a few months now as the head node to a PBS farm (OK, with
> only 3 machines behind it...), and has been responding to globus commands.
> I hoped it would pick up the new WP3 testbed apps and maybe throw up some
> warnings and error messages. What do I get instead?
>
> -------------------------------------------
> Booting 'RedHat 7.3 Linux'
>
> kernel (hd0,0)/vmlinuz-2.4.18-27.7.xsmp root=/dev/hda2
>
> Error 15: File not found
>
> Press any key to coninue...
> -------------------------------------------
>
> Needless to say, pressing any key to continue doesn't do anything useful.
> I'm going to try a reinstall from scratch to see what will happen...
> Still, this is a very strange thing to have happened. Has LCFGng changed
> the grub bootloader without actually making the kernel available? How
> could this have happenened?
>
> Cheers,
>
> Ian.
>
>
> --
> Ian Stokes-Rees [log in to unmask]
> Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
>
--
Ian Stokes-Rees [log in to unmask]
Particle Physics, Oxford http://www-pnp.physics.ox.ac.uk/~stokes
|