Hi Nuno Silva,
site BDII is on the CE
lcg-info-templates-1.0.15-1_sl3
lcg-info-generic-1.0.22-1_sl3
are installed on both site BDII and top level BDII (which runs on our WMSLB)
in bdii.log no errors are found:
---
Updating DB on port 2171
Waiting 115 s for query results.
Time for searches: 0 s
current port: 52122 - OK
Time to update DB: 0 s
current port: 52123 - OK
slapd responsive after attempt 0
Time to load DB: 0 s
Grabbing port 2170 for 2171
Tue Jun 5 11:31:58 CEST 2007
Sleeping for 30
---
Grtz,
Serge
> -----Original Message-----
> From: LHC Computer Grid - Rollout
> [mailto:[log in to unmask]] On Behalf Of Nuno
> Orestes Vaz Da Silva
> Sent: Tuesday 5 June 2007 10:34
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] troubles after update to glite-CE
> with Torque: missing jobmanager-pbs
>
> Where is the site BDII?
>
> In case of site BDII separated from CE, it is necessary to
> install by hand the lcg-info-templates and lcg-info-generic
> packages to the site BDII node and to remove the
> ${INSTALL_ROOT}/lcg/var/gip/ldif/static-file-Site.ldif file
> on all CE nodes.
>
> You can also query the site bdii for the information you want
> or consult bdii.log to know what is hapenning.
>
> Cheers,
> Nuno Silva
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout on behalf of Vrijaldenhoven, Serge
> Sent: Tue 6/5/2007 9:57 AM
> To: [log in to unmask]
> Subject: [LCG-ROLLOUT] troubles after update to glite-CE with
> Torque: missing jobmanager-pbs
>
> Hi all,
>
> After updating our small grid to/with glite-yaim-3.0.1-15 we
> can't get any CE to match a very simple jdl:
> Executable = "/bin/sleep";
> Arguments = "1";
> StdOutput = "std.out";
> StdError = "std.err";
> OutputSandbox = {"std.out","std.err"};
>
> Our guess is that the jobmanager-pbs is somehow missing (see below)?
> What should we do to add it?
>
> Greetings,
> Serge
>
> More information:
> =============
> lcg-infosites --vo phicos ce:
> #CPU Free Total Jobs Running Waiting ComputingElement
> ----------------------------------------------------------
> 394 0 2 2 0
> tbn20.nikhef.nl:2119/jobmanager-pbs-qshort
> 394 0 205 92 113
> tbn20.nikhef.nl:2119/jobmanager-pbs-qlong
> 142 142 0 0 0
> deimos.ehv.campus.philips.com:2119/blah-pbs-phicos
> 28 1 3 3 0
> mu6.matrix.sara.nl:2119/jobmanager-pbs-short
> 28 1 2 2 0
> mu6.matrix.sara.nl:2119/jobmanager-pbs-medium
> glite-job-list-match -v simplestJob.jdl:
> Selected Virtual Organisation name (from proxy certificate
> extension): phicos
> **************************************************************
> *************
> JOB CLASSAD
> [
> RetryCount = 3;
> Arguments = "1";
> MyProxyServer = "px.matrix.sara.nl";
> Executable = "/bin/sleep";
> StdOutput = "std.out";
> VOMS_FQAN = "/phicos/Role=NULL/Capability=NULL";
> OutputSandbox = { "std.out","std.err" };
> VirtualOrganisation = "phicos";
> StdError = "std.err"
> ]
> **************************************************************
> *************
> Connecting to host moon.ehv.campus.philips.com, port 7772
> ===================== glite-job-list-match failure
> ======================
> No Computing Element matching your job requirements has been found!
> ======================================================================
> /var/log/globus-gatekeeper.log:
> LCMAPS 7: 2007-06-04.14:52:54.879786.0000002261.0000000046 :
> Termination LCMAPS
> LCMAPS 0: 2007-06-04.14:52:54.879786.0000002261.0000000046 :
> lcmaps.mod-lcmaps_term(): terminating
> Notice: 5: Requested service: jobmanager-pbs
> Failure: Failed to find requested service: jobmanager-pbs: -2
> Failure: Failed to find requested service: jobmanager-pbs: -2
> ls -alF /opt/globus/etc/grid-services/:
> lrwxrwxrwx 1 root root 15 Jun 4 17:34
> jobmanager -> jobmanager-fork
> -rw-r--r-- 1 root root 196 Jun 4 17:34
> jobmanager-fork
> -rw-r--r-- 1 root root 200 Jun 4 17:34
> jobmanager-lcgpbs
>
> site-info.def:
> JOB_MANAGER=pbs
> CE_BATCH_SYS=pbs
> (ps: our nodes do not have a shared home)
>
> CE install (glite-CE-2.4.30-0):
> #!/bin/sh
> LOC=/opt/glite/yaim
> $LOC/bin/yaim -i -s $LOC/agrid/site-info.def -m glite-CE -m
> glite-torque-server-config
> CE config:
> #!/bin/sh
> LOC=/opt/glite/yaim
> $LOC/bin/yaim -c -s $LOC/agrid/site-info.def -n gliteCE -n
> TORQUE_server
>
|