Print

Print


If the script is runned for a second time on the same node, an output like this is obtained:

VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV

Setting up lsf gram reporter in MDS
----------------------------------------
configure: error: Cannot locate lsload
loading cache /dev/null
checking for lsload... no
Error locating LSF commands, aborting!
configure: warning: Cannot locate mpirun
loading cache ./config.cache
checking for mpirun... (cached) no
creating ./config.status
creating fork.pm
configure: warning: Cannot locate mpirun
loading cache /dev/null
checking for mpirun... no
checking for qdel... /usr/bin/qdel
checking for qstat... /usr/bin/qstat
checking for qsub... /usr/bin/qsub
checking for ssh... /usr/bin/ssh
updating cache /dev/null
creating ./config.status
creating /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
configure: error: Cannot locate condor_submit
loading cache /dev/null
checking for condor_submit... no
Error locating condor commands, aborting!
configure: warning: Using default of /etc for LSF_ENVDIR
configure: error: LSF configuration /etc/lsf.conf not found.
loading cache /dev/null
Error locating LSF commands, aborting!
loading cache ./config.cache
creating ./config.status
creating grid-cert-request-config
creating grid-security-config
Configuring config_crl ...
Configuring config_replica_manager ...
Configuring config_edgusers ...
Configuring config_users ...
Configuring config_rgma ...
Configuring config_workload_manager_env ...
Configuration Complete
Configuring config_torque_client ...
Stopping pbs_mom:                                          [  OK  ]
Starting pbs_mom:                                          [  OK  ]
Configuration Complete

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you look for the same error, you won't find it. See below...

>>> creating /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
>>> configure: error: Cannot locate condor_submit

./MS

-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Sotomayor, Maniel
Sent: Tue 2/15/2005 11:10 AM
To: [log in to unmask]
Subject:      [LCG-ROLLOUT] FW:      [LCG-ROLLOUT] YAIM: WN-torque config              failure on SLC3
 
Here's the log for another WN with the same config file:

VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
Setting up condor gram reporter in MDS
----------------------------------------
configure: error: Cannot locate condor_q
loading cache /dev/null
checking for condor_q... no
Error locating condor commands, aborting!
Setting up lsf gram reporter in MDS
----------------------------------------
configure: error: Cannot locate lsload
loading cache /dev/null
checking for lsload... no
Error locating LSF commands, aborting!
configure: warning: Cannot locate mpirun
loading cache ./config.cache
checking for mpirun... no
updating cache ./config.cache
creating ./config.status
creating fork.pm
configure: warning: Cannot locate mpirun
loading cache /dev/null
checking for mpirun... no
checking for qdel... /usr/bin/qdel
checking for qstat... /usr/bin/qstat
checking for qsub... /usr/bin/qsub
checking for ssh... /usr/bin/ssh
updating cache /dev/null
creating ./config.status
creating /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
Connection refused
qstat: cannot connect to server localhost (errno=111)
configure: error: Cannot locate condor_submit
loading cache /dev/null
checking for condor_submit... no
Error locating condor commands, aborting!
configure: warning: Using default of /etc for LSF_ENVDIR
configure: error: LSF configuration /etc/lsf.conf not found.
loading cache /dev/null
Error locating LSF commands, aborting!
loading cache ./config.cache
creating ./config.status
creating grid-cert-request-config
creating grid-security-config
Configuring config_crl ...
Configuring config_replica_manager ...
Configuring config_edgusers ...
Configuring config_users ...
Configuring config_rgma ...
Configuring config_workload_manager_env ...
Configuration Complete
Configuring config_torque_client ...
Stopping pbs_mom:                                          [  OK  ]
Starting pbs_mom:                                          [  OK  ]
Configuration Complete

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you look at the output, an error like this appears (different from the previous notified on the past email).
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

>>> creating /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
>>> Connection refused
>>> qstat: cannot connect to server localhost (errno=111)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
./MS


-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Sotomayor, Maniel
Sent: Tue 2/15/2005 11:03 AM
To: [log in to unmask]
Subject:      [LCG-ROLLOUT] YAIM: WN-torque config failure on SLC3
 
Hello,

I'm trying to configure my WN farm with torque support as documentation without success. Also I'm having unconsistencies with different errors while running the configure scripts.
Here is what is logged after creating the job manager config gile.

VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV

Setting up condor gram reporter in MDS
----------------------------------------
configure: error: Cannot locate condor_q
loading cache /dev/null
checking for condor_q... no
Error locating condor commands, aborting!
Setting up lsf gram reporter in MDS
----------------------------------------
configure: error: Cannot locate lsload
loading cache /dev/null
checking for lsload... no
Error locating LSF commands, aborting!
configure: warning: Cannot locate mpirun
loading cache ./config.cache
checking for mpirun... (cached) no
creating ./config.status
creating fork.pm
configure: warning: Cannot locate mpirun
loading cache /dev/null
checking for mpirun... no
checking for qdel... /usr/bin/qdel
checking for qstat... /usr/bin/qstat
checking for qsub... /usr/bin/qsub
checking for ssh... /usr/bin/ssh
updating cache /dev/null
creating ./config.status
creating /opt/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm
No default server name.
qstat: cannot connect to server (null) (errno=15034)
configure: error: Cannot locate condor_submit
loading cache /dev/null
checking for condor_submit... no
Error locating condor commands, aborting!
configure: warning: Using default of /etc for LSF_ENVDIR
configure: error: LSF configuration /etc/lsf.conf not found.
loading cache /dev/null
Error locating LSF commands, aborting!
loading cache ./config.cache
creating ./config.status
creating grid-cert-request-config
creating grid-security-config
Configuring config_crl ...
Configuring config_replica_manager ...
Configuring config_edgusers ...
Configuring config_users ...
Configuring config_rgma ...
Configuring config_workload_manager_env ...
Configuration Complete
Configuring config_torque_client ...
/opt/lcg/yaim/scripts/configure_WN_torque: line 13: /var/spool/pbs/server_name:No such file or directory
No default server name.
/usr/bin/pbsnodes: cannot connect to server , error=15034
/opt/lcg/yaim/scripts/configure_WN_torque: line 60: /var/spool/pbs/mom_priv/config: No such file or directory
Stopping pbs_mom:                                          [FAILED]
Starting pbs_mom: pbs_mom unable to go home: No such file or directory
                                                           [FAILED]
Configuration Complete
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I think that this error gives a hint:
>>> No default server name.
>>> qstat: cannot connect to server (null) (errno=15034)

Any ideas ?
I've been stuck with this for like 2-3 days. The thing is that, this error appears for some WN but not for all. For other WN, different errors appear; and all of them with the same site-info.def config file.

Any help please ?

./MS