Hi, I am testing a cream ce that I have set up on a scotgrid dev machine. The current setup is cream ce and torque/maui on different hosts with the logs mounted via NFS on the CE. The job gets submitted successfully and is traceable in torque. However, it moves from Q to W and waits forever. On deeper investigation it looks to me like torque or cream thinks the job is actually running on a job slot where another totally different job is running. When running pbsnodes for the node id and grepping for the cream job id - nothing is returned. Then tailing the torque logs there is another dteam job with the same exec_host as the cream job. This leads me to thinking that I may not have set up the log parsing correctly on the ce or something is getting thoroughly confused. The outputs from various commands to come to this conclusion are listed below. Any thoughts on this would be greatly appreciated. svr016:/var/spool/pbs/server_logs#* tracejob 2409813* /var/spool/pbs/mom_logs/20090220: No such file or directory /var/spool/pbs/sched_logs/20090220: No such file or directory Job: 2409813.svr016.gla.scotgrid.ac.uk 02/20/2009 14:36:34 S enqueuing into q30m, state 1 hop 1 02/20/2009 14:36:34 S Job Queued at request of [log in to unmask], owner = [log in to unmask], job name = cream_520507277, queue = q30m 02/20/2009 14:36:34 A queue=q30m 02/20/2009 15:19:37 S Job Modified at request of [log in to unmask] 02/20/2009 15:19:37 S Job Run at request of [log in to unmask] svr016:/var/spool/pbs/server_logs# *qstat -f 2409813* Job Id: *2409813.svr016.gla.scotgrid.ac.uk* Job_Name = cream_520507277 Job_Owner = [log in to unmask] job_state = W queue = q30m server = svr016.gla.scotgrid.ac.uk Checkpoint = u ctime = Fri Feb 20 14:36:34 2009 Error_Path = dev011.gla.scotgrid.ac.uk:/dev/null *exec_host = node182/2* Execution_Time = Fri Feb 20 15:49:41 2009 ...... svr016:~# *pbsnodes node182 | grep 2409813* svr016:~# *pbsnodes node182* node182 state = job-exclusive np = 8 properties = lcgpro ntype = cluster jobs = 0/2406819.svr016.gla.scotgrid.ac.uk, 1/ 2409176.svr016.gla.scotgrid.ac.uk, 2/2409262.svr016.gla.scotgrid.ac.uk, 3/ 2340154.svr016.gla.scotgrid.ac.uk, 4/2354251.svr016.gla.scotgrid.ac.uk, 5/ 2407443.svr016.gla.scotgrid.ac.uk, 6/2408238.svr016.gla.scotgrid.ac.uk, 7/ 2407591.svr016.gla.scotgrid.ac.uk status = opsys=linux,uname=Linux node182.beowulf.cluster 2.6.9-78.0.1.ELsmp #1 SMP Tue Aug 5 13:53:03 CDT 2008 x86_64,sessions=30091 1175 1856 9228 11111 20888 22885 30853,nsessions=8,nusers=4,idletime=3709813,totmem=5952308kb,availmem=2123760kb,physmem=16438780kb,ncpus=8,loadave=8.04,netload=4294967294,state=free,jobs= 2340154.svr016.gla.scotgrid.ac.uk 2354251.svr016.gla.scotgrid.ac.uk 2406819.svr016.gla.scotgrid.ac.uk 2407443.svr016.gla.scotgrid.ac.uk 2407591.svr016.gla.scotgrid.ac.uk 2408238.svr016.gla.scotgrid.ac.uk 2409176.svr016.gla.scotgrid.ac.uk *2409262.svr016.gla.scotgrid.ac.uk ,rectime=1235143422* 02/20/2009 15:20:23;S;*2409262.svr016.gla.scotgrid.ac.uk*;user=dteam166 group=dteam jobname=STDIN queue=q3d ctime=1235130339 qtime=1235130339 etime=1235130339 start=1235143223 [log in to unmask] *exec_host=node182/2 *Resource_List.cput=72:00:00 Resource_List.neednodes=1 Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=72:00:00 Thanks, Dug -- ScotGrid, Room 481, Kelvin Building, University of Glasgow tel: +44(0)141 330 6439