I've been trying to get our RH7.3 CE upgraded to LCG 2.3.0.
We still have some issues remaining that will proabably cause tomorrow
morning's tests some trouble:
1) SSH from nodes to CE isn't working (having followed 2.3.0 install
guide). On the WN we now have in /etc/ssh/ssh_config
---
Host *
Protocol 1,2
ForwardX11 yes
ForwardAgent yes
StrictHostKeyChecking no
HostbasedAuthentication yes
EnableSSHKeysign yes
# AFSPassTokenBeforeAuth yes
---
Is that right, or do I have to do anything with EnableSSHKeysign on the CE
too?
I've managed to kill the SSH daemon on the CE so I can't check the logs
there until tomorrow!
2) The CE insists on publishing our infinite queue as in production, even
though it was closed ~6 hours ago:
dgc-grid-37 root]# qstat -q
server: dgc-grid-35.brunel.ac.uk
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
short -- 00:30:00 00:45:00 -- 0 0 -- E R
long -- 12:00:00 24:00:00 -- 0 0 -- E R
infinite -- 80:00:00 100:00:0 -- 0 0 -- D S
--- ---
0 0
As it happens, since closing it I've not only restarted the PBS daemons
but also the globus-mds on the CE... what else needs reminding?
3) On startup there are error messages from the APEL configuration - is
there an updated APEL I should use? Or should I just hope that will fix
itself when I upgrade to 2.3.1, now that I'm all fired up!
(or have I fixed the permissions on the config file too early?)
Thanks
Henry
--
Dr. Henry Nebrensky [log in to unmask]
http://people.brunel.ac.uk/~eesrjjn
"The opossum is a very sophisticated animal.
It doesn't even get up until 5 or 6 p.m."
|