Hi Douglas Are you saying that the job is submitted correctly on torque, but it doesn't run (for whatever reason), and CREAM instead reports that the job is running ? This looks like bug https://savannah.cern.ch/bugs/index.php?45717 Can you check if the pbs log files there is something like "unable to run job" for that job ? Cheers, Massimo On Fri, 20 Feb 2009, Douglas McNab wrote: > Hi, > > I am testing a cream ce that I have set up on a scotgrid dev machine. > The current setup is cream ce and torque/maui on different hosts with the > logs mounted via NFS on the CE. > > The job gets submitted successfully and is traceable in torque. However, it > moves from Q to W and waits forever. On deeper investigation it looks to me > like torque or cream thinks the job is actually running on a job slot where > another totally different job is running. When running pbsnodes for the > node id and grepping for the cream job id - nothing is returned. Then > tailing the torque logs there is another dteam job with the same exec_host > as the cream job. This leads me to thinking that I may not have set up the > log parsing correctly on the ce or something is getting thoroughly confused. > > The outputs from various commands to come to this conclusion are listed > below. Any thoughts on this would be greatly appreciated. > > svr016:/var/spool/pbs/server_logs#* tracejob 2409813* > /var/spool/pbs/mom_logs/20090220: No such file or directory > /var/spool/pbs/sched_logs/20090220: No such file or directory > > Job: 2409813.svr016.gla.scotgrid.ac.uk > > 02/20/2009 14:36:34 S enqueuing into q30m, state 1 hop 1 > 02/20/2009 14:36:34 S Job Queued at request of > [log in to unmask], owner = > [log in to unmask], job name = > cream_520507277, queue = q30m > 02/20/2009 14:36:34 A queue=q30m > 02/20/2009 15:19:37 S Job Modified at request of > [log in to unmask] > 02/20/2009 15:19:37 S Job Run at request of > [log in to unmask] > > > svr016:/var/spool/pbs/server_logs# *qstat -f 2409813* > Job Id: *2409813.svr016.gla.scotgrid.ac.uk* > Job_Name = cream_520507277 > Job_Owner = [log in to unmask] > job_state = W > queue = q30m > server = svr016.gla.scotgrid.ac.uk > Checkpoint = u > ctime = Fri Feb 20 14:36:34 2009 > Error_Path = dev011.gla.scotgrid.ac.uk:/dev/null > *exec_host = node182/2* > Execution_Time = Fri Feb 20 15:49:41 2009 > ...... > > svr016:~# *pbsnodes node182 | grep 2409813* > > svr016:~# *pbsnodes node182* > node182 > state = job-exclusive > np = 8 > properties = lcgpro > ntype = cluster > jobs = 0/2406819.svr016.gla.scotgrid.ac.uk, 1/ > 2409176.svr016.gla.scotgrid.ac.uk, 2/2409262.svr016.gla.scotgrid.ac.uk, 3/ > 2340154.svr016.gla.scotgrid.ac.uk, 4/2354251.svr016.gla.scotgrid.ac.uk, 5/ > 2407443.svr016.gla.scotgrid.ac.uk, 6/2408238.svr016.gla.scotgrid.ac.uk, 7/ > 2407591.svr016.gla.scotgrid.ac.uk > status = opsys=linux,uname=Linux node182.beowulf.cluster > 2.6.9-78.0.1.ELsmp #1 SMP Tue Aug 5 13:53:03 CDT 2008 x86_64,sessions=30091 > 1175 1856 9228 11111 20888 22885 > 30853,nsessions=8,nusers=4,idletime=3709813,totmem=5952308kb,availmem=2123760kb,physmem=16438780kb,ncpus=8,loadave=8.04,netload=4294967294,state=free,jobs= > 2340154.svr016.gla.scotgrid.ac.uk 2354251.svr016.gla.scotgrid.ac.uk > 2406819.svr016.gla.scotgrid.ac.uk 2407443.svr016.gla.scotgrid.ac.uk > 2407591.svr016.gla.scotgrid.ac.uk 2408238.svr016.gla.scotgrid.ac.uk > 2409176.svr016.gla.scotgrid.ac.uk *2409262.svr016.gla.scotgrid.ac.uk > ,rectime=1235143422* > > 02/20/2009 15:20:23;S;*2409262.svr016.gla.scotgrid.ac.uk*;user=dteam166 > group=dteam jobname=STDIN queue=q3d ctime=1235130339 qtime=1235130339 > etime=1235130339 start=1235143223 > [log in to unmask] *exec_host=node182/2 > *Resource_List.cput=72:00:00 Resource_List.neednodes=1 > Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=72:00:00 > > > Thanks, > > Dug > > -- \\\|/// \\ ~ ~ // (/ @ @ /) -------oOOo-(_)-oOOo---------------------------------- Massimo Sgaravatto INFN Sezione di Padova Via Marzolo, 8 35131 Padova - Italy Tel: ++39 0498277047 Fax: ++39 0498277102 oooO E-mail: massimo.sgaravatto [at] pd.infn.it ( ) Oooo Home page: http://www.pd.infn.it/~sgaravat --------\ (----( )---------------------------------- \_) ) / (_/