I think that the file: /var/spool/pbs/server_name have to contain the same hostname. I think that you have to change your pbs server name, i.e. doing like yuo have pbs server on a different host. Luca Peter Love wrote: > All WNs are on the private network. It seems server_name is irrelevant > when determining which host to stagein/out from. How does the pbs_server > tell the pbs_mom which host to stage from? Can this be configured? > > Peter > > > Luca Vaccarossa ([log in to unmask]) wrote: > >>I've a mixed cluster (some WNs have public ip, others have private ip) >>Torque server is the CE with its public name, but for private ip WNs I >>have to confgure like that: >> >> >>$clienthost privateCEhostname >>$clienthost publicCEhostname >>$clienthost localhost >>$restricted *.<domain> >>$logevent 255 >> >> >>How is your configuration ? >> >>Peter Love wrote: >> >>>Unfortunately this doesn't help. I already have have this in /var/spool/pbs/mom_priv/config >>> >>>$clienthost ce.lancs.pygrid >>>$clienthost localhost >>>$restricted ce.lancs.pygrid >>>$logevent 255 >>>$ideal_load 1.6 >>>$max_load 2.1 >>> >>>stagein/out using the public hostname (lunegw.lancs.ac.uk) >>> >>>Ignore the 'No route to host' error, we have firewalled port 22 on the >>>CE public interface. The WN shouldn't use the CE's public interface for >>>staging. >>> >>> >>>PBS Job Id: 134.lunegw.lancs.ac.uk >>>Job Name: test.sh >>>File stage in failed, see below. >>>Job will be retried later, please investigate and correct problem. >>>Post job file processing error; job 134.lunegw.lancs.ac.uk on host >>>test01.lancs.pygrid/0 >>> >>>Unable to copy file 134.lunegw..OU to lunegw.lancs.ac.uk:/home/dteam004/test.sh.o134 >>> >>> >>>>>>error from copy >>> >>>lunegw.lancs.ac.uk: No route to host >>>uk port 22: No route to host >>>lost connection >>> >>> >>>>>>end error output >>> >>>Output retained on that host in: /var/spool/pbs/undelivered/134.lunegw..OU >>> >>> >>>Why would the WN /var/spool/pbs/server_name contain public_CE_HOSTNAME? >>>I confirmed this doesn't affect things. >>> >>>Peter >>> >>> >>>Luca Vaccarossa ([log in to unmask]) wrote: >>> >>> >>>>Peter Love wrote: >>>> >>>> >>>>>Hi, >>>>> >>>>>We're setting up a new farm with WNs on a private network, without >>>>>shared /home. My question is how to configure torque to specify the CE's >>>>>private hostname (ce.lancs.pygrid) when submiting jobs to the WNs. At >>>>>the moment the WNs attempt to copy output back to the torque server via >>>>>the public hostname of the CE, which I assume is found using 'hostname >>>>>-f' at the time qsub is run. >>>>> >>>>>All the public/private keys are in order, copying from WNs to >>>>>ce.lancs.pygrid works fine. >>>>> >>>>>The WN /var/spool/pbs/server_name file contains 'ce.lancs.pygrid'. >>>>> >>>>>Is this a jobmanager issue? Should the qsub specify the server as >>>>>'ce.lancs.pygrid'? >>>>> >>>>>Besides the brief gocwiki docs, is there any docs around for private >>>>>network config probs? Are all sites with private network using NFS >>>>>shared /home ? >>>>> >>>>>Peter >>>> >>>>Yuo have to put the the CE's private hostname as first in the file >>>>/var/spool/pbs/mom_priv/config >>>> >>>> >>>>$clienthost ce.lancs.pygrid >>>>$clienthost public_CE_HOSTNAME >>>> >>>> >>>> >>>>on your WNs on a private network. >>>>In file /var/spool/pbs/server_name I have the >>>>public_CE_HOSTNAME >>>> >>>> >>>>I hope this help. >>>> >>>>Luca