I've a mixed cluster (some WNs have public ip, others have private ip)
Torque server is the CE with its public name, but for private ip WNs I
have to confgure like that:
$clienthost privateCEhostname
$clienthost publicCEhostname
$clienthost localhost
$restricted *.<domain>
$logevent 255
How is your configuration ?
Peter Love wrote:
> Unfortunately this doesn't help. I already have have this in /var/spool/pbs/mom_priv/config
>
> $clienthost ce.lancs.pygrid
> $clienthost localhost
> $restricted ce.lancs.pygrid
> $logevent 255
> $ideal_load 1.6
> $max_load 2.1
>
> stagein/out using the public hostname (lunegw.lancs.ac.uk)
>
> Ignore the 'No route to host' error, we have firewalled port 22 on the
> CE public interface. The WN shouldn't use the CE's public interface for
> staging.
>
>
> PBS Job Id: 134.lunegw.lancs.ac.uk
> Job Name: test.sh
> File stage in failed, see below.
> Job will be retried later, please investigate and correct problem.
> Post job file processing error; job 134.lunegw.lancs.ac.uk on host
> test01.lancs.pygrid/0
>
> Unable to copy file 134.lunegw..OU to lunegw.lancs.ac.uk:/home/dteam004/test.sh.o134
>
>>>>error from copy
>
> lunegw.lancs.ac.uk: No route to host
> uk port 22: No route to host
> lost connection
>
>>>>end error output
>
> Output retained on that host in: /var/spool/pbs/undelivered/134.lunegw..OU
>
>
> Why would the WN /var/spool/pbs/server_name contain public_CE_HOSTNAME?
> I confirmed this doesn't affect things.
>
> Peter
>
>
> Luca Vaccarossa ([log in to unmask]) wrote:
>
>>Peter Love wrote:
>>
>>>Hi,
>>>
>>>We're setting up a new farm with WNs on a private network, without
>>>shared /home. My question is how to configure torque to specify the CE's
>>>private hostname (ce.lancs.pygrid) when submiting jobs to the WNs. At
>>>the moment the WNs attempt to copy output back to the torque server via
>>>the public hostname of the CE, which I assume is found using 'hostname
>>>-f' at the time qsub is run.
>>>
>>>All the public/private keys are in order, copying from WNs to
>>>ce.lancs.pygrid works fine.
>>>
>>>The WN /var/spool/pbs/server_name file contains 'ce.lancs.pygrid'.
>>>
>>>Is this a jobmanager issue? Should the qsub specify the server as
>>>'ce.lancs.pygrid'?
>>>
>>>Besides the brief gocwiki docs, is there any docs around for private
>>>network config probs? Are all sites with private network using NFS
>>>shared /home ?
>>>
>>>Peter
>>
>>Yuo have to put the the CE's private hostname as first in the file
>>/var/spool/pbs/mom_priv/config
>>
>>
>>$clienthost ce.lancs.pygrid
>>$clienthost public_CE_HOSTNAME
>>
>>
>>
>>on your WNs on a private network.
>>In file /var/spool/pbs/server_name I have the
>>public_CE_HOSTNAME
>>
>>
>>I hope this help.
>>
>>Luca
|