All WNs are on the private network. It seems server_name is irrelevant
when determining which host to stagein/out from. How does the pbs_server
tell the pbs_mom which host to stage from? Can this be configured?
Peter
Luca Vaccarossa ([log in to unmask]) wrote:
> I've a mixed cluster (some WNs have public ip, others have private ip)
> Torque server is the CE with its public name, but for private ip WNs I
> have to confgure like that:
>
>
> $clienthost privateCEhostname
> $clienthost publicCEhostname
> $clienthost localhost
> $restricted *.<domain>
> $logevent 255
>
>
> How is your configuration ?
>
> Peter Love wrote:
> > Unfortunately this doesn't help. I already have have this in /var/spool/pbs/mom_priv/config
> >
> > $clienthost ce.lancs.pygrid
> > $clienthost localhost
> > $restricted ce.lancs.pygrid
> > $logevent 255
> > $ideal_load 1.6
> > $max_load 2.1
> >
> > stagein/out using the public hostname (lunegw.lancs.ac.uk)
> >
> > Ignore the 'No route to host' error, we have firewalled port 22 on the
> > CE public interface. The WN shouldn't use the CE's public interface for
> > staging.
> >
> >
> > PBS Job Id: 134.lunegw.lancs.ac.uk
> > Job Name: test.sh
> > File stage in failed, see below.
> > Job will be retried later, please investigate and correct problem.
> > Post job file processing error; job 134.lunegw.lancs.ac.uk on host
> > test01.lancs.pygrid/0
> >
> > Unable to copy file 134.lunegw..OU to lunegw.lancs.ac.uk:/home/dteam004/test.sh.o134
> >
> >>>>error from copy
> >
> > lunegw.lancs.ac.uk: No route to host
> > uk port 22: No route to host
> > lost connection
> >
> >>>>end error output
> >
> > Output retained on that host in: /var/spool/pbs/undelivered/134.lunegw..OU
> >
> >
> > Why would the WN /var/spool/pbs/server_name contain public_CE_HOSTNAME?
> > I confirmed this doesn't affect things.
> >
> > Peter
> >
> >
> > Luca Vaccarossa ([log in to unmask]) wrote:
> >
> >>Peter Love wrote:
> >>
> >>>Hi,
> >>>
> >>>We're setting up a new farm with WNs on a private network, without
> >>>shared /home. My question is how to configure torque to specify the CE's
> >>>private hostname (ce.lancs.pygrid) when submiting jobs to the WNs. At
> >>>the moment the WNs attempt to copy output back to the torque server via
> >>>the public hostname of the CE, which I assume is found using 'hostname
> >>>-f' at the time qsub is run.
> >>>
> >>>All the public/private keys are in order, copying from WNs to
> >>>ce.lancs.pygrid works fine.
> >>>
> >>>The WN /var/spool/pbs/server_name file contains 'ce.lancs.pygrid'.
> >>>
> >>>Is this a jobmanager issue? Should the qsub specify the server as
> >>>'ce.lancs.pygrid'?
> >>>
> >>>Besides the brief gocwiki docs, is there any docs around for private
> >>>network config probs? Are all sites with private network using NFS
> >>>shared /home ?
> >>>
> >>>Peter
> >>
> >>Yuo have to put the the CE's private hostname as first in the file
> >>/var/spool/pbs/mom_priv/config
> >>
> >>
> >>$clienthost ce.lancs.pygrid
> >>$clienthost public_CE_HOSTNAME
> >>
> >>
> >>
> >>on your WNs on a private network.
> >>In file /var/spool/pbs/server_name I have the
> >>public_CE_HOSTNAME
> >>
> >>
> >>I hope this help.
> >>
> >>Luca
|