Hello Carlos,
Thanks for replying.
The output shows the torque version (torque-1.0.1p6-11.SL30X.st). If rpm is queried for it with the -ql flags show that it creates the /var/spool/pbs directory, but actually it wasn't created. (I guess that was the problem).
Further, on another WN, that command wrote a wrong CE_HOST value!!! (It's writing "localhost"), though the CE_HOST value is set to be ce.prd.hp.com.
This might be a problem too right ?
Regards,
./MS
-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Carlos Borrego Iglesias
Sent: Tue 2/15/2005 11:23 AM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] YAIM: WN-torque config failure on SLC3
Hello Maniel,
It seems the function /opt/lcg/yaim/functions/config_torque_client is
unable to write to the file /var/spool/pbs/server_name. The shell command
is pretty dummy:
echo "$CE_HOST" > /var/spool/pbs/server_name
it's not a matter of an empty CE_HOST because if not
it would have complained before... disk problem??? could you send me the
output of the command:
rpm -qf /var/spool/pbs/ (just in case torque rpm hasn't been installed)
cheers
Carlos
==========================================================================
Carlos Borrego Iglesias PIC (Port d'Informació Científica)
tel: +34 93 581 3308 Campus UAB - Edifici D
e-mail: [log in to unmask] E-08193 Bellaterra
==========================================================================
On Tue, 15 Feb 2005, Sotomayor, Maniel wrote:
> Hello,
>
> I'm trying to configure my WN farm with torque support as documentation without success. Also I'm having unconsistencies with different errors while running the configure scripts.
> Here is what is logged after creating the job manager config gile.
>
> VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
>
> Setting up condor gram reporter in MDS
> ----------------------------------------
> configure: error: Cannot locate condor_q
> loading cache /dev/null
> checking for condor_q... no
> Error locating condor commands, aborting!
> Setting up lsf gram reporter in MDS
> ----------------------------------------
> configure: error: Cannot locate lsload
> loading cache /dev/null
> checking for lsload... no
> Error locating LSF commands, aborting!
> configure: warning: Cannot locate mpirun
> loading cache ./config.cache
> checking for mpirun... (cached) no
> creating ./config.status
> creating fork.pm
> configure: warning: Cannot locate mpirun
> loading cache /dev/null
> checking for mpirun... no
> checking for qdel... /usr/bin/qdel
> checking for qstat... /usr/bin/qstat
> checking for qsub... /usr/bin/qsub
> checking for ssh... /usr/bin/ssh
> updating cache /dev/null
> creating ./config.status
> creating /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
> No default server name.
> qstat: cannot connect to server (null) (errno=15034)
> configure: error: Cannot locate condor_submit
> loading cache /dev/null
> checking for condor_submit... no
> Error locating condor commands, aborting!
> configure: warning: Using default of /etc for LSF_ENVDIR
> configure: error: LSF configuration /etc/lsf.conf not found.
> loading cache /dev/null
> Error locating LSF commands, aborting!
> loading cache ./config.cache
> creating ./config.status
> creating grid-cert-request-config
> creating grid-security-config
> Configuring config_crl ...
> Configuring config_replica_manager ...
> Configuring config_edgusers ...
> Configuring config_users ...
> Configuring config_rgma ...
> Configuring config_workload_manager_env ...
> Configuration Complete
> Configuring config_torque_client ...
> /opt/lcg/yaim/scripts/configure_WN_torque: line 13: /var/spool/pbs/server_name:No such file or directory
> No default server name.
> /usr/bin/pbsnodes: cannot connect to server , error=15034
> /opt/lcg/yaim/scripts/configure_WN_torque: line 60: /var/spool/pbs/mom_priv/config: No such file or directory
> Stopping pbs_mom: [FAILED]
> Starting pbs_mom: pbs_mom unable to go home: No such file or directory
> [FAILED]
> Configuration Complete
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> I think that this error gives a hint:
> >>> No default server name.
> >>> qstat: cannot connect to server (null) (errno=15034)
>
> Any ideas ?
> I've been stuck with this for like 2-3 days. The thing is that, this error appears for some WN but not for all. For other WN, different errors appear; and all of them with the same site-info.def config file.
>
> Any help please ?
>
> ./MS
>
|