On Thu, Feb 17, 2005 at 06:12:13PM +0200 or thereabouts, Wei Xing wrote:
> Hi,
>
> My CE got too much jobs in the lhcb queue, Does anyone know how to set
> the max_queueable with Torque?
If you have torque then you have LCG2_3_0-SL3 then you have maui.
I suggest you introduce some fairshare within the maui.cfg
The following document referenced from the yaim guide should
give you a starting point.
http://grid-deployment.web.cern.ch/grid-deployment/documentation/Maui-Cookbook/
I've attached our latest maui.cfg but the one in document also
does something useful.
Steve
>
> Thanks,
>
> Wei
>
>
> ==============================================
>
> 4991.ce101 STDIN lhcb001 00:15:56 R lhcb
> 4992.ce101 STDIN lhcb001 00:15:46 R lhcb
> 4993.ce101 STDIN lhcb001 00:13:58 R lhcb
> 4994.ce101 STDIN lhcb001 00:14:02 R lhcb
> 4999.ce101 STDIN lhcb001 00:12:51 R lhcb
> 5000.ce101 STDIN lhcb001 00:12:55 R lhcb
> 5001.ce101 STDIN lhcb001 00:14:10 R lhcb
> 5025.ce101 STDIN lhcb001 0 Q lhcb
> 5028.ce101 STDIN lhcb001 0 Q lhcb
> 5030.ce101 STDIN lhcb001 0 Q lhcb
> 5032.ce101 STDIN lhcb001 0 Q lhcb
> 5033.ce101 STDIN lhcb001 0 Q lhcb
> 5034.ce101 STDIN lhcb001 0 Q lhcb
> 5035.ce101 STDIN lhcb001 0 Q lhcb
> 5036.ce101 STDIN lhcb001 0 Q lhcb
> 5037.ce101 STDIN lhcb001 0 Q lhcb
> 5038.ce101 STDIN lhcb001 0 Q lhcb
> 5039.ce101 STDIN lhcb001 0 Q lhcb
> 5040.ce101 STDIN lhcb001 0 Q lhcb
> 5041.ce101 STDIN lhcb001 0 Q lhcb
> 5042.ce101 STDIN lhcb001 0 Q lhcb
> 5043.ce101 STDIN lhcb001 0 Q lhcb
> 5044.ce101 STDIN lhcb001 0 Q lhcb
> 5045.ce101 STDIN lhcb001 0 Q lhcb
> 5046.ce101 STDIN lhcb001 0 Q lhcb
> 5047.ce101 STDIN lhcb001 0 Q lhcb
> 5048.ce101 STDIN lhcb001 0 Q lhcb
> 5049.ce101 STDIN lhcb001 0 Q lhcb
> 5050.ce101 STDIN lhcb001 0 Q lhcb
> 5052.ce101 STDIN lhcb001 0 Q lhcb
> 5051.ce101 STDIN lhcb001 0 Q lhcb
> 5053.ce101 STDIN lhcb001 0 Q lhcb
> 5054.ce101 STDIN lhcb001 0 Q lhcb
> 5055.ce101 STDIN lhcb001 0 Q lhcb
> 5056.ce101 STDIN lhcb001 0 Q lhcb
> 5057.ce101 STDIN lhcb001 0 Q lhcb
> 5058.ce101 STDIN lhcb001 0 Q lhcb
> 5059.ce101 STDIN lhcb001 0 Q lhcb
> =============================================
>
> --
> ============================================================
> Wei Xing, M.Sc.
> Research Associate Tel: 00357-22892663
> Dept. of Computer Science Fax: 00357-22892701
> University of Cyprus email: [log in to unmask]
> PO Box 20537
> CY1678, Nicosia, CYPRUS
--
Steve Traylen
[log in to unmask]
http://www.gridpp.ac.uk/
#
# MAUI configuration example
# @(#)maui.cfg David Groep 20031015.1
# for MAUI version 3.2.5
#
SERVERHOST csflnx353.rl.ac.uk
ADMIN1 root
ADMIN2 root
ADMIN3 traylens bly ras jfw1 adye brew olaiya
ADMINHOST csflnx353.rl.ac.uk
RMCFG[base] TYPE=PBS
#RMHOST[0] localhost
#RMSERVER[0] localhost
SERVERPORT 40559
SERVERMODE NORMAL
# Set PBS server polling interval. Since we have many short jobs
# and want fast turn-around, set this to 10 seconds (default: 2 minutes)
RMPOLLINTERVAL 00:00:60
# a max. 10 MByte log file in a logical location
LOGFILE /var/log/maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 1
LOGFILEROLLDEPTH 9
# Set up a weighting among groups.
CREDWEIGHT 1
USERWEIGHT 1
GROUPWEIGHT 2
# User default targets
#USERCFG[DEFAULT] MAXPS=190000000 FSTARGET=5+
#USERCFG[DEFAULT] FSTARGET=30+ MAXPS=60000000,577395504
#USERCFG[DEFAULT] FSTARGET=20+
USERCFG[DEFAULT] FSTARGET=30+
# 2004:08:02: 16O CPUs / Job slots (80 nodes).
# Maximum job length is 168 hours = 168 * 3600 = 604800 seconds
# Maximum PS for all nodes = 160 * 168 * 3600 = 96768000 seconds
# Soft limit for jobs with full term left is 1/8 of all running jobs,
# ie: 160 * 0.125* 604800 = 12096000 seconds
# Hard limit is double this, ie 2 * 12096000 = 24192000 seconds
# Updated for Feb 2005.
# Remember never set any one 0.0
GROUPCFG[alice] FSTARGET=0.01
GROUPCFG[atlas] FSTARGET=28.80
GROUPCFG[bfactory] FSTARGET=32.13
GROUPCFG[cms] FSTARGET=13.52
GROUPCFG[d0] FSTARGET=11.27
GROUPCFG[h1] FSTARGET=2.25
GROUPCFG[harp] FSTARGET=0.01
GROUPCFG[lhcb] FSTARGET=11.27
GROUPCFG[minos] FSTARGET=2.25
GROUPCFG[ukqcd] FSTARGET=0.01
GROUPCFG[zeus] FSTARGET=1.12
GROUPCFG[dteam] FSTARGET=0.01
GROUPCFG[sno] FSTARGET=0.11
GROUPCFG[lcfi] FSTARGET=0.01
GROUPCFG[ppdgrid] FSTARGET=0.01
GROUPCFG[theory] FSTARGET=0.01
#GROUPCFG[bfactory] FSTARGET=100 MAXPS=174182400,193536000
#GROUPCFG[bfactory] FSTARGET=100 MAXPS=577000000,577395504
#GROUPCFG[sno] FSTARGET=10 MAXPS=a,b
#GROUPCFG[h1] FSTARGET=10 MAXPS=377000000,377395504
#GROUPCFG[cdf] FSTARGET=10 MAXPS=a,b
#GROUPCFG[d0] FSTARGET=10 MAXPS=a,b
#GROUPCFG[dark] FSTARGET=10 MAXPS=a,b
#GROUPCFG[minos] FSTARGET=10 MAXPS=a,b
#GROUPCFG[theory] FSTARGET=10 MAXPS=a,b
#GROUPCFG[ukqcd] FSTARGET=10 MAXPS=a,b
#GROUPCFG[zeus] FSTARGET=10 MAXPS=a,b
# Fareshare policies
FSPOLICY DEDICATEDPS
FSDEPTH 9
FSINTERVAL 24:00:00
FSDECAY 0.9
FSWEIGHT 1
FSUSERWEIGHT 1
FSGROUPWEIGHT 50
ENABLENEGJOBPRIORITY true
REJECTNEGPRIOJOBS false
# Create a 1 cpu reservation for S jobs on the lcgpro (rh73) and sl3lcg (sl3) nodes.
# It must be on a "slow" node.
SRCFG[monitor] STARTTIME=0:00:00 ENDTIME=24:00:00
SRCFG[monitor] PERIOD=INFINITY
SRCFG[monitor] TASKCOUNT=2 RESOURCES=PROCS:1,MEM:400
SRCFG[monitor] CLASSLIST=S
SRCFG[monitor] NODEFEATURES=slow
SRCFG[sl3] STARTTIME=0:00:00 ENDTIME=24:00:00
SRCFG[sl3] PERIOD=INFINITY
SRCFG[sl3] TASKCOUNT=2 RESOURCES=PROCS:1,MEM:400
SRCFG[sl3] CLASSLIST=S
SRCFG[sl3] NODEFEATURES=sl3lcg
#USERCFG[DEFAULT] FSTARGET=10+
# Don't add weight to longer queued more likely to run.
XFACTORWEIGHT 0
QUEUETIMEWEIGHT 0
USERWEIGHT 0
GROUPWEIGHT 0
# Make sure we don't allow queue stuffing though when people exceed
# the MAXJOB values.
JOBPRIOACCRUALPOLICY FULLPOLICY
# Don't defer jobs.
DEFERTIME 0
# Allocate fastest nodes first.
NODEALLOCATIONPOLICY FASTEST
# A fudge that fixes things.
#NODEAVAILABILITYPOLICY DEDICATED:SWAP
# Temporary limits for various issues
USERCFG[babarmc] MAXJOB=100
#USERCFG[mohanty] MAXJOB=1
|