No, a comma!
Yaim 3.1 (used for the SL4 worker nodes) has GLOBUS_TCP_PORT_RANGE comma
not space separated. So when I used my new site-info.def file to install
an lcg-CE (yaim 3.0.1) it didn't work very subtly.
Chris.
> -----Original Message-----
> From: Testbed Support for GridPP member institutes
> [mailto:[log in to unmask]] On Behalf Of Bly, MJ (Martin)
> Sent: 11 July 2007 23:19
> To: [log in to unmask]
> Subject: Re: New CE problems
>
> /etc/ssh/shosts.equiv ?
>
>
> Martin.
> --
> -----------------------------------
> Martin Bly +44|0 1235 446981
> RAL Tier1 Fabric Team Manager
> -----------------------------------
>
> > -----Original Message-----
> > From: Testbed Support for GridPP member institutes
> > [mailto:[log in to unmask]] On Behalf Of Brew, CAJ (Chris)
> > Sent: 11 July 2007 16:38
> > To: [log in to unmask]
> > Subject: New CE problems
> >
> > Hi,
> >
> > I've installed a second CE to route jobs to my new SL4 worker nodes
> > but haven't put it into my site BDII yet.
> >
> > In principle the new CE should be identical to my old CE, I haven't
> > tried any mods yet to send jobs only to the SL4 nodes.
> >
> > I can su to dteam001 and submit jobs to the dteam queue and
> have them
> > run OK, I can ssh from the workernodes to the new CE
> without needing a
> > password and I can globus-job-run heplnx207.pp.rl.ac.uk
> > /usr/bin/whoami successfully.
> >
> > But, when I submit a job via edg-job-submit -r
> heplnx207.pp.rl.ac.uk
> > it eventually gets aborted.
> >
> > edg-job-get-useless-information gives the following reasons:
> >
> > Event: Done
> > - exit_code = 1
> > - host = lcgrb01.gridpp.rl.ac.uk
> > - reason = Got a job held event, reason:
> > Unspecified
> > gridmanager error
> > - source = LogMonitor
> > - src_instance = unique
> > - status_code = FAILED
> > - timestamp = Wed Jul 11 15:22:04 2007
> > - user =
> /C=UK/O=eScience/OU=CLRC/L=RAL/CN=chris
> > dteam brew
> > ---
> > Event: Done
> > - exit_code = 1
> > - host = lcgrb01.gridpp.rl.ac.uk
> > - reason = Job got an error while in the CondorG
> > queue.
> > - source = LogMonitor
> > - src_instance = unique
> > - status_code = FAILED
> > - timestamp = Wed Jul 11 15:22:16 2007
> > - user =
> /C=UK/O=eScience/OU=CLRC/L=RAL/CN=chris
> > dteam brew
> >
> > As far as I can tell it never even gets into pbs though I do see
> > various processes being run by dteam001 on the CE and see
> > AuthenticateUser and StatusJob requests in the pbs_server logs.
> >
> > Can anyone give me any ideas of where to look for any
> logging info on
> > the job submission or any other help?
> >
> > Thanks,
> > Chris.
> >
>
|