Hi I am not too familiar with Torque and so I can't help too much. I can only say that usually these problems occur when there are problems with file staging. Is the ssh key setting ok ? Does checkjob JOBID say something useful ? Cheers, Massimo On Fri, 6 Feb 2009, Douglas McNab wrote: > Hi Massimo, > > Yes I agree, I did configure it with yaim however, cfengine was coming along > and updating this file in the background. > I have now fixed the missing glexec group and added the tomcat users to the > VO users by hand. > > Job submission is now working up to a point. I can submit and the job > reaches torque/maui. > However, it appears to be stuck at idle from the cream side and waiting on > our qstat side. > > Very odd. I shall keep investigating. Thanks for your help. > > Regards, > > Dug > > 2009/2/5 Massimo Sgaravatto - INFN Padova <[log in to unmask]> > > > Hi > > > > It like to me that the glexec user and group doesn't exist > > > > Did you configure that machine via yaim (yaim should create that > > user ...) ? > > Didn't it report any errors/warning ? > > > > > > Cheers, Massimo > > > > > > > > On Thu, 5 Feb 2009, Douglas McNab wrote: > > > > > Hi, > > > > > > I have ran the fry tool: > > > > > > dev011:/# fry -v -s /opt/glite/yaim/etc/site-info.def > > > > > > > > =========================================================================== > > > > F R Y version 3.1.0-0 > > > (for INFN-GRID 3.1.0) > > > > > =========================================================================== > > > > General information: > > > - Hostname : dev011.gla.scotgrid.ac.uk > > > - Test started : 2009.02.05 16:19 > > > > > --------------------------------------------------------------------------- > > > > Installed node type metapackages: > > > - CREAM > > > > > =========================================================================== > > > > Functions list to be executed: > > > - check_cream_bad_jars > > > - check_cream_glexec > > > - check_cream_rpm_versions > > > - check_cream_voms_cert_pem_ext > > > - check_cream_wars > > > - check_part > > > - check_yum_autoupdate > > > > > --------------------------------------------------------------------------- > > > > 1. Executing check_cream_bad_jars... > > > Checking if there are some old jar files around > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > 2. Executing check_cream_glexec... > > > Checking permissions of glexec executable > > > -rwsr-x--- 1 root 11001 45468 Jun 24 2008 /opt/glite/sbin/glexec > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > 3. Executing check_cream_rpm_versions... > > > Checking versions of glite-ce and yaim-cream-ce rpms... > > > glite-ce-cream-1.9.6-0 > > > glite-ce-job-plugin-1.9.0-6 > > > glite-ce-monitor-1.9.3-1 > > > glite-ce-blahp-1.10.6-0.slc4 > > > glite-ce-ce-plugin-1.9.0-3 > > > glite-yaim-cream-ce-4.0.6-0 > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > 4. Executing check_cream_voms_cert_pem_ext... > > > Checking if there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > It looks like there are VOMS cert files without .pem as extension > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > 5. Executing check_cream_wars... > > > Checking CREAM war > > > md5sum: /webapps/ce-cream.war: No such file or directory > > > /usr/share/fry/functions/check_cream_wars: line 31: [: > > > 1c0e74cd74f706f2ee8b6fbc7f363064: unary operator expected > > > Checking CEMon war > > > md5sum: /webapps/ce-monitor.war: No such file or directory > > > /usr/share/fry/functions/check_cream_wars: line 41: [: > > > 800e9fb94638b80a9d7c4fff0027d969: unary operator expected > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > 6. Executing check_part... > > > Starting partition free space check... > > > Checking available space for /... OK > > > Checking available space for /boot... OK > > > Checking available space for /dev/shm... OK > > > Checking available space for /home... OK > > > Checking available space for /opt... OK > > > Checking available space for /tmp... OK > > > Checking available space for /var... OK > > > Checking available space for /var/spool/pbs/server_priv/accounting... OK > > > Checking available space for /opt/edg/var/info... OK > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > 7. Executing check_yum_autoupdate... > > > Checking if yum autoupdate is enabled > > > Yum autoupdate is enabled > > > Please note that an automatic update of the cream and/or glexec rpms not > > > followed by a reconfiguration is known to cause problems > > > -> RESULT : OK > > > -> DETAILS : Everything OK! > > > > > --------------------------------------------------------------------------- > > > > > =========================================================================== > > > > Final report: > > > - Test completed : 2009.02.05 16:19 > > > - Total time : 3 seconds > > > - Global result : OK > > > > > =========================================================================== > > > > > > I think it must be something else related to the YAIM configuration - > > > possibly the users tomcat and glexec. > > > I will keep looking thanks. > > > > > > Dug > > > > > > 2009/2/5 Raquel Muņoz <[log in to unmask]> > > > > > > > Hi, > > > > > > > > > > > > There is an automatic tool that can be used to check if *some* parts of > > > > the CREAM CE have been properly configured or if there are some issues > > to > > > > fix. > > > > > > > > This tool is based on the ig-fry framework. > > > > > > > > To use this tool: > > > > > > > > * Install the ig-fry rpm from > > http://www.pd.infn.it/~sgaravat/ig-fry/<http://www.pd.infn.it/%7Esgaravat/ig-fry/> > > <http://www.pd.infn.it/%7Esgaravat/ig-fry/> > > > > > > > > The Latest version of ig-fry for CREAM is 3.1.2-0 > > > > > > > > * Install the rpm > > > > * Run it: > > > > > > > > fry -v -s <siteinfo.def> > > > > siteinfo.def> is the one used to configure the CREAM CE via yaim > > > > > > > > Please then post the produced output > > > > > > > > (Please note that if fry reports that everything is ok, this only means > > > > that the tests done by fry were successfully passed. This doesn't > > exclude > > > > that there are still issues in some other parts of the CE configuration > > > > not checked by fry) > > > > > > > > Regards > > > > > > > > > > > > 2009/2/5 Douglas McNab <[log in to unmask]>: > > > > > Hi, > > > > > > > > > > I have a few questions about setting up a cream ce and wondered if > > anyone > > > > > had any ideas about the issues I am seeing. > > > > > I believe that I have set up the grid service correctly but when I > > submit > > > > a > > > > > job using the simple CLI tools I receive an error message. > > > > > Unfortunately, I have been through the logs and have not found > > anything > > > > that > > > > > is helping me diagnose the issue. > > > > > I am sure it probably something very silly I have forgotten to set up > > but > > > > > any help would be appreciated. > > > > > > > > > > So far I have been following the instructions at: > > > > > > > > > > > http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream31-devel > > > > > > > > > > The test job was submitted using: glite-ce-job-submit -a -d -r > > > > > dev011.gla.scotgrid.ac.uk:8443/cream-pbs-q30m hellocream.jdl > > > > > and results in this fatal error: > > > > > > > > > > 2009-02-05 14:08:28,217 FATAL - MethodName=[jobRegister] > > Timestamp=[Thu > > > > 05 > > > > > Feb 2009 14:08:28] ErrorCode=[0] Description=[system error] > > > > > FaultCause=[cannot write the job wrapper (jobId = CREAM304767308)! > > The > > > > > problem seems to be related to glexec which reported: Broken pipe] > > > > > > > > > > investigating the logs I found the stacktrace in the catalina.out: > > > > > > > > > > org.glite.ce.creamapi.cmdmanagement.CommandException: cannot write > > the > > > > job > > > > > wrapper (jobId = CREAM810541183)! The problem seems to be related to > > > > glexec > > > > > which reported: Broken pipe at > > > > > > > > > > > org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor.createJobSandboxDir(AbstractJobExecutor.java:1038) > > > > > > > > > > I have seen this page: > > > > > > > > > > > http://grid.pd.infn.it/cream/field.php?n=Main.ErrorMessagesReportedByCREAMToClient > > > > > which details the error message reported so this lead me to the > > glexec > > > > > logs. I increased the verbosity and debug levels in the glexec file > > in > > > > > order to help debug this issue. However, I failed to find anything > > of > > > > any > > > > > interest in /opt/glite/var/log/glexec_lcas_lcmaps.log > > > > > > > > > > it appears that it has mapped my credential correctly and there does > > not > > > > > appear to be any errors reported: > > > > > > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): 1 > > > > > ***************************************** > > > > > ** > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): SIGLEN: 256 > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): USER: > > > > > /C=UK/O=eScience/OU=Glasgow/L=Comps > > > > > erv/CN=douglas mcnab > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): UCA: > > > > > /C=UK/O=eScience/OU=Glasgow/L=Comps > > > > > erv/CN=douglas mcnab > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): SERVER: > > > > > /C=UK/O=eScience/OU=Glasgow/L=Comps > > > > > erv/CN= > > > > [log in to unmask] > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): SCA: > > > > > /C=UK/O=eScienceCA/OU=Authority/CN= > > > > > UK e-Science CA > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): VO: vo.scotgrid.ac.uk > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): URI: > > > > svr029.gla.scotgrid.ac.uk:15000 > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): DATE1: 20090205130408Z > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): DATE2: 20090206010408Z > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): fqan: > > > > > /vo.scotgrid.ac.uk/Role=NULL/Capabi > > > > > lity=NULL > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): GROUP: /vo.scotgrid.ac.uk > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): ROLE: NULL > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): CAP: NULL > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas_plugin_voms-print_vomsdata(): 1 > > > > > ***************************************** > > > > > ** > > > > > > > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcas.mod-lcas_run_va(): succeeded > > > > > > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > Credential Print: > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > dn : > > /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=douglas > > > > > mcnab > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > uid : 220001 [1/1] > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > pgid : 220000 [1/1] > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : VO > > credential > > > > > mapping : [1/1] > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcmaps_printVoMapping(): address of vo mapping struct: 0x92da118 > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcmaps_printVoMapping(): VO string: > > > > > /vo.scotgrid.ac.uk/Role=NULL/ > > > > > Capability=NULL > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcmaps_printVoMapping(): mapped groupname: scotg > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > lcmaps_printVoMapping(): mapped GID: 220000 > > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : > > > > > pool_index : > > > > > %2fc%3duk%2fo%3descience%2fou%3dglasgow%2fl%3dcompserv%2fcn% > > > > > 3ddouglas%20mcnab:scotg > > > > > > > > > > Does anyone know what sort of common errors would occur with glexec? > > > > > > > > > > Regards, > > > > > > > > > > Dug > > > > > > > > > > > > > > > > > > > > > > > > > -- > > \\\|/// > > \\ ~ ~ // > > (/ @ @ /) > > -------oOOo-(_)-oOOo---------------------------------- > > Massimo Sgaravatto > > INFN Sezione di Padova > > Via Marzolo, 8 > > 35131 Padova - Italy > > Tel: ++39 0498277047 Fax: ++39 0498277102 > > oooO E-mail: massimo.sgaravatto [at] pd.infn.it > > ( ) Oooo Home page: http://www.pd.infn.it/~sgaravat<http://www.pd.infn.it/%7Esgaravat> > > --------\ (----( )---------------------------------- > > \_) ) / > > (_/ > > > > > > -- \\\|/// \\ ~ ~ // (/ @ @ /) -------oOOo-(_)-oOOo---------------------------------- Massimo Sgaravatto INFN Sezione di Padova Via Marzolo, 8 35131 Padova - Italy Tel: ++39 0498277047 Fax: ++39 0498277102 oooO E-mail: massimo.sgaravatto [at] pd.infn.it ( ) Oooo Home page: http://www.pd.infn.it/~sgaravat --------\ (----( )---------------------------------- \_) ) / (_/