Dear Felix,
I don't know if I understood your problem correctly. You have several
8-core worker nodes, but PBS submits only 2-6 jobs to each of them and
then marks it as busy, right?
It suggests pbs_mom misconfiguration. Did you configure it by hand or
using YAIM? Pbs_mom determines whether a machine is busy or free by
looking at kernel load average statistics. These values describe just
how many processes actively demand CPU time (you can find more detailed
explanation at http://www.linuxjournal.com/article/9001), without any
normalization to number of CPUs. It means that if you want your system
to be fully loaded this value should be no less than number of cores you
have. You can find the desired values for you pbs_mom instance in the
file /var/spool/pbs/mom_priv/config ($ideal_load and $max_load). If your
system exceeds $max_load, it is marked as busy and stays busy until
load_average does not fall below $ideal_load.
As far as I remember, YAIM function configuring this file enters some
default values around 2, regardless of the number of cores you have,
because when it was written most of the nodes had only 2 single-core
CPUs. If you have more, you need to change these values manually (for
example to 8 and 10, respectively). This should solve your problem.
Best regards,
Adam
felix farcas pisze:
> Hello to all of you
> I wish you all a wonderful week :)
>
> My big problem is that I have a lot of processor free but the status
> of pbsnodes is all the time busy.
>
> I have
> maui-3.2.6p20-snap and torque-2.3.0-snap
>
> My maui configuratiion file is:
>
> The problem as I see is that my jobs are waiting for # MAUI
> configuration example
> SERVERHOST cn-ce.itim-cj.ro
> ADMIN1 root
> ADMIN3 edginfo rgma edguser
> ADMINHOSTS cn-ce.itim-cj.ro
> RMCFG[base] TYPE=PBS
> SERVERPORT 40559
> SERVERMODE NORMAL
> #Set PBS server polling interval. If you have short # queues or/and
> jobs it is worth to set a short interval. (10 seconds)
> RMPOLLINTERVAL 00:00:10
> # a max. 10 MByte log file in a logical location
> LOGFILE /var/log/maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 1
> # Set the delay to 1 minute before Maui tries to run a job again, # in
> case it failed to run the first time.
> # The default value is 1 hour.
> DEFERTIME 00:01:00
> # Necessary for MPI grid jobs
> ENABLEMULTIREQJOBS TRUE
> GROUPWEIGHT 1
> GROUPCFG[sgmops] PRIORITY=100000
> GROUPCFG[ops] PRIORITY=100000
> #SRCFG[ops] FLAGS=SPACEFLEX
> SRCFG[ops] TASKCOUNT=1 RESOURCES=PROCS:1
> SRCFG[ops] PERIOD=INFINITY
> SRCFG[ops] CLASSLIST=ops,seeops
> SRCFG[ops] HOSTLIST=cn-wn8
>
> The "ops" jobs are waiting each morning to be processed, and all the
> time in another queue, I mean at another Worknode .
> Each time when I verify where they tried to be processed the result
> is, node busy..
> When running pbsnodes -a the result us:
> cn-wn3,4,5,6,7,8 = busy but only 2 or max 6 jobs are running
>
> My nodes file is:
> cn-wn3 np=8 lcgpro
> cn-wn4 np=8 lcgpro
> cn-wn5 np=8 lcgpro
> cn-wn6 np=8 lcgpro
> cn-wn7 np=8 lcgpro
> cn-wn8 np=8 lcgpro
>
> I tried stopping pbs_mom on the wornode for a time, tried restarting
> it, but could not find the solution.
>
> The only way is rebooting one machine which has the job in queue, but
> for Linux this is not a right solution.
>
> Are there any indication on how to find a solution at this problem?
> What could be the best solution in such a case?
>
> Thank you
> Felix
>
|