Hello Charles, Yes, that's exactly what I did. I repeated the steps and followed the gocwiki and your guidelines but still have problems submitting jobs. :-( Here is what i had/have... ************************************************** [root@ce root]# cat /var/spool/pbs/torque.cfg SUBMITFILTER /var/spool/pbs/submit_filter.pl [root@ce root]# ll /var/spool/pbs/torque.cfg -rw-r--r-- 1 root root 45 Mar 9 10:17 /var/spool/pbs/torque.cfg [root@ce root]# ll /var/spool/pbs/submit_filter.pl -rwxr-xr-x 1 root root 4072 Mar 8 12:46 /var/spool/pbs/submit_filter.pl [root@ce root]# su - dteam001 [ce] /home/dteam001 > cat testjob.sh #!/bin/bash printf "`hostname`: `pwd`: `date`\n" [ce] /home/dteam001 > qsub testjob.sh 1326.ce.prd.hp.com [ce] /home/dteam001 > qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 1326.ce testjob.sh dteam001 0 E short [ce] /home/dteam001 > cat testjob.sh.e1326 No value for $TERM and no -T specified No value for $TERM and no -T specified [ce] /home/dteam001 > cat testjob.sh.o1326 bh-wn0.prd.hp.com: /home/dteam001: Thu Mar 24 11:09:38 AST 2005 [ce] /home/dteam001 > *************************************************** It seems that is working, at least locally... But if i try submitting a job from our UI (ui.prd.hp.com) it fails. The logging info says something like: *************************************************** Event: Transfer - dest_host = ce.prd.hp.com:2119/jobmanager-pbs - dest_instance = /var/edgwl/logmonitor/CondorG.log/CondorG.1108581538.log - dest_jobid = unavailable - destination = LRMS - host = rb.prd.hp.com - reason = Job successfully submitted to Globus - result = OK - source = LogMonitor - src_instance = unique - timestamp = Thu Mar 24 15:13:57 2005 - user = /C=PR/O=HP-PR/OU=HPTC/CN=Maniel [log in to unmask] --- Event: Running - host = rb.prd.hp.com - node = ce.prd.hp.com - source = LogMonitor - src_instance = unique - timestamp = Thu Mar 24 15:16:21 2005 - user = /C=PR/O=HP-PR/OU=HPTC/CN=Maniel [log in to unmask] --- Event: Done - exit_code = 1 - host = rb.prd.hp.com - reason = Cannot read JobWrapper output, both from Condor and from Maradona. - source = LogMonitor - src_instance = unique - status_code = FAILED - timestamp = Thu Mar 24 15:16:43 2005 - user = /C=PR/O=HP-PR/OU=HPTC/CN=Maniel [log in to unmask] --- Event: Resubmission - host = rb.prd.hp.com - reason = unavailable - result = WILLRESUB - source = LogMonitor - src_instance = unique - tag = unavailable - timestamp = Thu Mar 24 15:16:43 2005 - user = /C=PR/O=HP-PR/OU=HPTC/CN=Maniel [log in to unmask] --- ****************************************************** I checked the gocwiki and found out about the jobwrapper output, but did not seemed to fix it. Can somebody help us ? Thank you! ./MS -----Original Message----- From: Charles Loomis [mailto:[log in to unmask]] Sent: Thu 3/24/2005 1:53 AM To: Sotomayor, Maniel; LHC Computer Grid - Rollout Subject: Re: aborted jobs Hi Maniel, As Maarten pointed out it looks like the submitfilter is not correctly configured. If you use a submit filter (and you need it for MPI support), then you must put the full path and name of the script into the /var/spool/pbs/torque.cfg file a line like: SUBMITFILTER /full/path/to/script From Maarten's debugging it looks like you have this configuration, but either the script doesn't exist or has the wrong permissions. First check to see that it exists (example can be found on the MPI Wiki page). This script *must* be executable for all users. There is nothing sensitive in the script, so permissions like 0755 are best. If both of those are OK, then check that the script actually runs correctly. This you can do with a simple torque job submission. If the qsub produces no errors it should be OK. This filter is actually run by the qsub command with the user's privileges on all job submissions. Cheers. Cal Sotomayor, Maniel wrote: > Hello, > > I'm having problems after submitting jobs to my cluster. The jobs > successfully execute through qsub after installing MPICH. I'm having > errors when reading jobwrapper output. I checked the gocwiki that talks > about it, but have not solved it yet with them. I'm attaching the > logging info output. Can you help me solve this ? > > Sincerely, > ./MS >