On Thu, 17 Mar 2005, Nicolescu, Edward L wrote:
> Folks,
>
> I have installed SL3 on all LCG2 nodes at BNL-LCG2 and next I proceeded to
> install LCG-2.3.1 as indicated in the
> "LCG Generic Installation and Configuration" manual, using yaim.
>
> Following the configuration of the CE node with configure_CE_torque, I have
> untarred the "relocatable distribution" on
> one of the worker nodes and followed the steps recommended under "To
> install a UI or WN as root" (page 15). First,
> I ran the two commands below,
>
> /opt/lcg/yaim/scripts/install_node /opt/lcg/yaim/examples/site-info.def
> lcg-TAR
> /opt/lcg/yaim/scripts/configure_TAR /opt/lcg/yaim/examples/site-info.def WN
>
> Running the configure_TAR script has already resulted in errors regarding
> rgma configuration (maybe a good soul
> on this list will explain me why r-gma is needed on an worker node ?!) as
It is to allow user jobs to publish information in user-defined tables,
e.g. to allow the job's progress to be monitored (e.g. which event it is
processing, or which step of the generation/reconstruction it is doing).
It will also be used for standard job monitoring (resource consumption etc.).
The YAIM developers should comment on the errors you got.
> shown below:
>
> configuring config_rgma
> Traceback (most recent call last):
> File "/opt/edg/sbin/edg-rgma-config", line 1004, in ?
> main()
> File "/opt/edg/sbin/edg-rgma-config", line 43, in main
> xmlFileCreator = XmlFileCreator(rgmaRoot, env.developer, var).create()
> File "/opt/edg/sbin/edg-rgma-config", line 240, in create
> self.__CanonicalProducerServletXml()
> File "/opt/edg/sbin/edg-rgma-config", line 270, in
> __CanonicalProducerServletXml
> self.__insertableSuperclassParam()
> File "/opt/edg/sbin/edg-rgma-config", line 398, in
> __insertableSuperclassParam
> jvmMemory = PropertyFileRdr(os.path.join(self.var.catalina_home,
> 'conf/tomcat4.conf'))('JAVA_OPTS', '-Xmx64m')
> File "/opt/edg/sbin/edg-rgma-config", line 948, in __init__
> handle = file(filename, 'r')
> IOError: [Errno 2] No such file or directory:
> '/var/tomcat4/conf/tomcat4.conf'
>
> Anyway, I have disregarded the above error and proceeded to configure my
> worker node with torque by running
> the script below,
>
> /opt/lcg/yaim/scripts/configure_WN_torque
> /opt/lcg/yaim/examples/site-info.def
>
> Following a first set of errors regarding the inability to locate LSF,
> Condor and MPI commands (why did the
> script try to configure LSF and Condor when all I wanted was Torque ?) the
> same r-gma related output was
> issued again. Then, when it got to configure torque, a new set of errors
> were issued as shown below:
>
> Configuring config_torque_client ...
> Can't exec "/usr/bin/pbsnodes": No such file or directory at
> /opt/edg/sbin/edg-pbs-knownhosts line 32.
> Could note open /usr/bin/pbsnodes -a pipe: No such file or directory
> /opt/lcg/yaim/scripts/configure_WN_torque: line 60:
> /var/spool/pbs/mom_priv/config: No such file or directory
> error reading information on service pbs_mom: No such file or directory
> /opt/lcg/yaim/scripts/configure_WN_torque: line 70:
> /etc/rc.d/init.d/pbs_mom: No such file or directory
> /opt/lcg/yaim/scripts/configure_WN_torque: line 72:
> /etc/rc.d/init.d/pbs_mom: No such file or directory
> Configuration Complete
>
> Can anybody, please, explain me why configure_WN_torque needed pbs_mom ? As
> I understand, pbs_mom
> is used to start a pbs batch execution mini-server... But why on an worker
> node ?!?
How do you think your job is started on the WN?
Also, Torque is just a fancy PBS renamed.
> Your input regarding the error messages above and whether they should be
> ignored or not will be much appreciated !
>
> Cheers,
>
> Edward Nicolescu
>
> PS All the output from configure_WN_torque is dispalyed below.
>
>
> Script started on Wed 16 Mar 2005 04:53:35 PM EST
> [root@lcg-wn01 yaim]# /opt/lcg/yaim/scripts/configure_WN_torque
> /opt/lcg/yaim/examples/site-info.def
> Configuring config_ldconf ...
> Configuring config_sysconfig_edg ...
> Configuring config_sysconfig_globus ...
> Configuring config_sysconfig_lcg ...
> Configuring config_lcgenv ...
> Configuring config_globus ...
> creating globus-sh-tools-vars.sh
> creating globus-script-initializer
> creating Globus::Core::Paths
> checking globus-hostname
> Done
>
> Creating...
> /opt/globus/etc/grid-info.conf
> Done
>
> Creating...
> /opt/globus/sbin/SXXgris
> ln: `/opt/globus/sbin/globus-mds': File exists
> /opt/globus/libexec/grid-info-script-initializer
> /opt/globus/libexec/grid-info-mds-core
> /opt/globus/libexec/grid-info-common
> /opt/globus/libexec/grid-info-cpu*
> /opt/globus/libexec/grid-info-fs*
> /opt/globus/libexec/grid-info-mem*
> /opt/globus/libexec/grid-info-net*
> /opt/globus/libexec/grid-info-platform*
> /opt/globus/libexec/grid-info-os*
> /opt/globus/etc/grid-info-resource-ldif.conf
> /opt/globus/etc/grid-info-resource-register.conf
> /opt/globus/etc/grid-info-resource.schema
> /opt/globus/etc/grid.gridftpperf.schema
> /opt/globus/etc/gridftp-resource.conf
> /opt/globus/etc/gridftp-perf-info
> /opt/globus/etc/grid-info-slapd.conf
> /opt/globus/etc/grid-info-site-giis.conf
> /opt/globus/etc/grid-info-site-policy.conf
> /opt/globus/etc/grid-info-server-env.conf
> /opt/globus/etc/grid-info-deployment-comments.conf
> Done
> Creating gatekeeper configuration file...
> Done
> Creating state file directory.
> Done.
> Reading gatekeeper configuration file...
> Warning: Host cert file: /etc/grid-security/hostcert.pem not found. Re-run
> setup-globus-gram-job-manager after installing host cert file.
> Determining system information...
> Creating job manager configuration file...
> Done
> Setting up fork gram reporter in MDS
> -----------------------------------------
> Done
>
> Setting up pbs gram reporter in MDS
> ----------------------------------------
> configure: error: Cannot locate qstat
> loading cache /dev/null
> checking for qstat... no
> Error locating pbs commands, aborting!
> Setting up condor gram reporter in MDS
> ----------------------------------------
> configure: error: Cannot locate condor_q
> loading cache /dev/null
> checking for condor_q... no
> Error locating condor commands, aborting!
> Setting up lsf gram reporter in MDS
> ----------------------------------------
> configure: error: Cannot locate lsload
> loading cache /dev/null
> checking for lsload... no
> Error locating LSF commands, aborting!
> configure: warning: Cannot locate mpirun
> loading cache ./config.cache
> checking for mpirun... (cached) no
> creating ./config.status
> creating fork.pm
> configure: warning: Cannot locate mpirun
> configure: error: Cannot locate qdel
> loading cache /dev/null
> checking for mpirun... no
> checking for qdel... no
> Error locating PBS commands, aborting!
> configure: error: Cannot locate condor_submit
> loading cache /dev/null
> checking for condor_submit... no
> Error locating condor commands, aborting!
> configure: warning: Using default of /etc for LSF_ENVDIR
> configure: error: LSF configuration /etc/lsf.conf not found.
> loading cache /dev/null
> Error locating LSF commands, aborting!
> loading cache ./config.cache
> creating ./config.status
> creating grid-cert-request-config
> creating grid-security-config
> Configuring config_crl ...
> Configuring config_replica_manager ...
> Configuring config_edgusers ...
> Configuring config_users ...
> Configuring config_rgma ...
> Traceback (most recent call last):
> File "/opt/edg/sbin/edg-rgma-config", line 1004, in ?
> main()
> File "/opt/edg/sbin/edg-rgma-config", line 43, in main
> xmlFileCreator = XmlFileCreator(rgmaRoot, env.developer, var).create()
> File "/opt/edg/sbin/edg-rgma-config", line 240, in create
> self.__CanonicalProducerServletXml()
> File "/opt/edg/sbin/edg-rgma-config", line 270, in
> __CanonicalProducerServletXml
> self.__insertableSuperclassParam()
> File "/opt/edg/sbin/edg-rgma-config", line 398, in
> __insertableSuperclassParam
> jvmMemory = PropertyFileRdr(os.path.join(self.var.catalina_home,
> 'conf/tomcat4.conf'))('JAVA_OPTS', '-Xmx64m')
> File "/opt/edg/sbin/edg-rgma-config", line 948, in __init__
> handle = file(filename, 'r')
> IOError: [Errno 2] No such file or directory:
> '/var/tomcat4/conf/tomcat4.conf'
> Configuring config_workload_manager_env ...
> Configuration Complete
> Configuring config_torque_client ...
> Can't exec "/usr/bin/pbsnodes": No such file or directory at
> /opt/edg/sbin/edg-pbs-knownhosts line 32.
> Could note open /usr/bin/pbsnodes -a pipe: No such file or directory
> /opt/lcg/yaim/scripts/configure_WN_torque: line 60:
> /var/spool/pbs/mom_priv/config: No such file or directory
> error reading information on service pbs_mom: No such file or directory
> /opt/lcg/yaim/scripts/configure_WN_torque: line 70:
> /etc/rc.d/init.d/pbs_mom: No such file or directory
> /opt/lcg/yaim/scripts/configure_WN_torque: line 72:
> /etc/rc.d/init.d/pbs_mom: No such file or directory
> Configuration Complete
>
>
|