Where is the site BDII?
In case of site BDII separated from CE, it is necessary to install by hand the lcg-info-templates and lcg-info-generic packages to the site BDII node and to remove the ${INSTALL_ROOT}/lcg/var/gip/ldif/static-file-Site.ldif file on all CE nodes.
You can also query the site bdii for the information you want or consult bdii.log to know what is hapenning.
Cheers,
Nuno Silva
-----Original Message-----
From: LHC Computer Grid - Rollout on behalf of Vrijaldenhoven, Serge
Sent: Tue 6/5/2007 9:57 AM
To: [log in to unmask]
Subject: [LCG-ROLLOUT] troubles after update to glite-CE with Torque: missing jobmanager-pbs
Hi all,
After updating our small grid to/with glite-yaim-3.0.1-15 we can't get any CE to match a very simple jdl:
Executable = "/bin/sleep";
Arguments = "1";
StdOutput = "std.out";
StdError = "std.err";
OutputSandbox = {"std.out","std.err"};
Our guess is that the jobmanager-pbs is somehow missing (see below)?
What should we do to add it?
Greetings,
Serge
More information:
=============
lcg-infosites --vo phicos ce:
#CPU Free Total Jobs Running Waiting ComputingElement
----------------------------------------------------------
394 0 2 2 0 tbn20.nikhef.nl:2119/jobmanager-pbs-qshort
394 0 205 92 113 tbn20.nikhef.nl:2119/jobmanager-pbs-qlong
142 142 0 0 0 deimos.ehv.campus.philips.com:2119/blah-pbs-phicos
28 1 3 3 0 mu6.matrix.sara.nl:2119/jobmanager-pbs-short
28 1 2 2 0 mu6.matrix.sara.nl:2119/jobmanager-pbs-medium
glite-job-list-match -v simplestJob.jdl:
Selected Virtual Organisation name (from proxy certificate extension): phicos
***************************************************************************
JOB CLASSAD
[
RetryCount = 3;
Arguments = "1";
MyProxyServer = "px.matrix.sara.nl";
Executable = "/bin/sleep";
StdOutput = "std.out";
VOMS_FQAN = "/phicos/Role=NULL/Capability=NULL";
OutputSandbox = { "std.out","std.err" };
VirtualOrganisation = "phicos";
StdError = "std.err"
]
***************************************************************************
Connecting to host moon.ehv.campus.philips.com, port 7772
===================== glite-job-list-match failure ======================
No Computing Element matching your job requirements has been found!
======================================================================
/var/log/globus-gatekeeper.log:
LCMAPS 7: 2007-06-04.14:52:54.879786.0000002261.0000000046 : Termination LCMAPS
LCMAPS 0: 2007-06-04.14:52:54.879786.0000002261.0000000046 : lcmaps.mod-lcmaps_term(): terminating
Notice: 5: Requested service: jobmanager-pbs
Failure: Failed to find requested service: jobmanager-pbs: -2
Failure: Failed to find requested service: jobmanager-pbs: -2
ls -alF /opt/globus/etc/grid-services/:
lrwxrwxrwx 1 root root 15 Jun 4 17:34 jobmanager -> jobmanager-fork
-rw-r--r-- 1 root root 196 Jun 4 17:34 jobmanager-fork
-rw-r--r-- 1 root root 200 Jun 4 17:34 jobmanager-lcgpbs
site-info.def:
JOB_MANAGER=pbs
CE_BATCH_SYS=pbs
(ps: our nodes do not have a shared home)
CE install (glite-CE-2.4.30-0):
#!/bin/sh
LOC=/opt/glite/yaim
$LOC/bin/yaim -i -s $LOC/agrid/site-info.def -m glite-CE -m glite-torque-server-config
CE config:
#!/bin/sh
LOC=/opt/glite/yaim
$LOC/bin/yaim -c -s $LOC/agrid/site-info.def -n gliteCE -n TORQUE_server
|