On Wed, Feb 11, 2009 at 8:44 PM, Torsten Harenberg
<[log in to unmask]> wrote:
> Hi all,
>
> I fight with a strange problem.
>
> Suffering from stange firewall problems, we had to move our CE quickly into
> a new subnet. So it has a new IP address now which is correct in the name
> service.
>
> [root@wn046 etc]# host grid-ce.physik.uni-wuppertal.de
> grid-ce.physik.uni-wuppertal.de has address 132.195.125.5
>
> Some (only some!) of the WNs however show a strange behaviour: the pbs_mom
> processes show log entries like
>
> 02/11/2009 20:37:20;0080; pbs_mom;Req;scan_for_exiting;no contact with
> server at hostaddr 84c36863, port 15001, jobid
> 62934.grid-ce.physik.uni-wuppertal.de errno 115
>
> and also netstat shows an entry like:
>
> tcp 0 1 wn046.pleiades.uni-wupp:936 132.195.104.99:pbs
> SYN_SENT
>
> using the old IP address of the pbs server.
>
> But nowhere in /etc or /var/spool/pbs (exept in logs of course) there are
> any entries pointing to the old IP.
>
> [root@wn003 etc]# grep -r 132.195.104.99 *
> [root@wn003 etc]#
>
> I also tried to put the new IP number instead of the FQDN in
> /var/spool/pbs/server_name and in $pbsserver /var/spool/pbs/mom_priv/config.
>
> Also putting the new IP in /etc/hosts doesn't help (it wasn't in there
> before)
>
> And nsswich & Co. looks okay to me:
>
> [root@wn003 etc]# grep hosts nsswitch.conf
> hosts: files dns
> [root@wn003 etc]# cat host.conf
> order hosts,bind
> [root@wn003 etc]#
Check with a
# getent hosts grid-ce.physik.uni-wuppertal.de
this works below any caching e.g nscd.
Of course check /var/spool/pbs/mom_priv/jobs
And of course a mom restart won't do any harm.
Steve
>
> I'm running completely out of ideas (besides waiting for all jobs to finish
> and reboot the whole cluster... :-( )
>
> Any thoughts?
>
> Best regards,
>
> Torsten
>
> --
> <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
> <> <>
> <> Dr. Torsten Harenberg [log in to unmask] <>
> <> Bergische Universitaet <>
> <> FB C - Physik Tel.: +49 (0)202 439-3521 <>
> <> Gaussstr. 20 Fax : +49 (0)202 439-2811 <>
> <> 42097 Wuppertal <>
> <> <>
> <><><><><><><>< Of course it runs NetBSD http://www.netbsd.org ><>
>
--
Steve Traylen
|