Dear Matt,
sorry for not answering earlier, I was busy yesterday afternoon with other things.
Can you give me some more information, eg. which LSF version are you running ? How many worker nodes and cores do you have in the system ? Can you give me the output of the "lshosts", "lshosts -w", "bhosts -w" command ?
(If you send them directly to me that's fine).
Cheers,
Ulrich
On Thursday, December 02, 2010 05:55:27 pm Matt Doidge wrote:
> > ps: back to Matt: your mail seems to indicate that somehow the
>
> config got messed up, as indeed the line you added to lrms_backend_cmd
> needs to be present.
>
> > The other place to look, take a good critical look at the vomap section
> > of the scheduler conf file ... YAIM gets this wrong once in awhile (are
> > you using YAIM?)
>
> Heyup,
> I've squinted at the vomap section of the
> lcg-info-dynamic-scheduler.conf (it was created using yaim, along with
> the most of our setup). Unless hidden whitespace matters it looks fine.
> I think the problem lies in the lrms_backend_cmd giving up zeros for
> nactive and nfree:
>
> #/opt/glite/libexec/lrmsinfo-lsf
> nactive 0
> nfree 0
> now 1291308084
> schedCycle 120
> {'group': 'sgmops','jobid': '22793','user': 'sgmops019','qtime':
> 1291307905.0,'queue': 'normal','state':'running','maxwalltime':
> 9999999999.0}
>
> I managed to catch an ops job runninghere, it's running, so nactive
> should be 1...right? (not counting all the other users on running jobs
> on this shared cluster).
>
> I'll try to take this up with the glite-info-dynamic-lsf developer.
>
> Thanks a lot,
> Matt
--
--------------------------------------
Dr. Ulrich Schwickerath
CERN IT/PES-PS
1211 Geneva 23
e-mail: [log in to unmask]
phone: +41 22 767 9576
mobile: +41 76 487 5602