Print

Print


Hi

I am not too familiar with Torque and so I can't help too much.

I can only say that usually these problems occur when there are problems 
with file staging.

Is the ssh key setting ok ?

Does 
checkjob JOBID
say something useful ?

			Cheers, Massimo

On Fri, 6 Feb 2009, Douglas McNab wrote:

> Hi Massimo,
> 
> Yes I agree, I did configure it with yaim however, cfengine was coming along
> and updating this file in the background.
> I have now fixed the missing glexec group and added the tomcat users to the
> VO users by hand.
> 
> Job submission is now working up to a point.  I can submit and the job
> reaches torque/maui.
> However, it appears to be stuck at idle from the cream side and waiting on
> our qstat side.
> 
> Very odd.  I shall keep investigating.  Thanks for your help.
> 
> Regards,
> 
> Dug
> 
> 2009/2/5 Massimo Sgaravatto - INFN Padova <[log in to unmask]>
> 
> > Hi
> >
> > It like to me that the glexec user and group doesn't exist
> >
> > Did you configure that machine via yaim (yaim should create that
> > user ...) ?
> > Didn't it report any errors/warning ?
> >
> >
> >                                Cheers, Massimo
> >
> >
> >
> > On Thu, 5 Feb 2009, Douglas McNab wrote:
> >
> > > Hi,
> > >
> > > I have ran the fry tool:
> > >
> > > dev011:/# fry -v -s /opt/glite/yaim/etc/site-info.def
> > >
> > >
> > ===========================================================================
> > > > F R Y version 3.1.0-0
> > >   (for INFN-GRID 3.1.0)
> > >
> > ===========================================================================
> > > > General information:
> > >   - Hostname     : dev011.gla.scotgrid.ac.uk
> > >   - Test started : 2009.02.05 16:19
> > >
> > ---------------------------------------------------------------------------
> > > > Installed node type metapackages:
> > >   - CREAM
> > >
> > ===========================================================================
> > > > Functions list to be executed:
> > >   - check_cream_bad_jars
> > >   - check_cream_glexec
> > >   - check_cream_rpm_versions
> > >   - check_cream_voms_cert_pem_ext
> > >   - check_cream_wars
> > >   - check_part
> > >   - check_yum_autoupdate
> > >
> > ---------------------------------------------------------------------------
> > > > 1. Executing check_cream_bad_jars...
> > > Checking if there are some old jar files around
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > > > 2. Executing check_cream_glexec...
> > > Checking permissions of glexec executable
> > > -rwsr-x---  1 root 11001 45468 Jun 24  2008 /opt/glite/sbin/glexec
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > > > 3. Executing check_cream_rpm_versions...
> > > Checking versions of glite-ce and yaim-cream-ce rpms...
> > > glite-ce-cream-1.9.6-0
> > > glite-ce-job-plugin-1.9.0-6
> > > glite-ce-monitor-1.9.3-1
> > > glite-ce-blahp-1.10.6-0.slc4
> > > glite-ce-ce-plugin-1.9.0-3
> > > glite-yaim-cream-ce-4.0.6-0
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > > > 4. Executing check_cream_voms_cert_pem_ext...
> > > Checking if there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > It looks like there are VOMS cert files without .pem as extension
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > > > 5. Executing check_cream_wars...
> > > Checking CREAM war
> > > md5sum: /webapps/ce-cream.war: No such file or directory
> > > /usr/share/fry/functions/check_cream_wars: line 31: [:
> > > 1c0e74cd74f706f2ee8b6fbc7f363064: unary operator expected
> > > Checking CEMon war
> > > md5sum: /webapps/ce-monitor.war: No such file or directory
> > > /usr/share/fry/functions/check_cream_wars: line 41: [:
> > > 800e9fb94638b80a9d7c4fff0027d969: unary operator expected
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > > > 6. Executing check_part...
> > > Starting partition free space check...
> > > Checking available space for /... OK
> > > Checking available space for /boot... OK
> > > Checking available space for /dev/shm... OK
> > > Checking available space for /home... OK
> > > Checking available space for /opt... OK
> > > Checking available space for /tmp... OK
> > > Checking available space for /var... OK
> > > Checking available space for /var/spool/pbs/server_priv/accounting... OK
> > > Checking available space for /opt/edg/var/info... OK
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > > > 7. Executing check_yum_autoupdate...
> > > Checking if yum autoupdate is enabled
> > > Yum autoupdate is enabled
> > > Please note that an automatic update of the cream and/or glexec rpms not
> > > followed by a reconfiguration is known to cause problems
> > > -> RESULT  : OK
> > > -> DETAILS : Everything OK!
> > >
> > ---------------------------------------------------------------------------
> > >
> > ===========================================================================
> > > > Final report:
> > >  - Test completed : 2009.02.05 16:19
> > >  - Total time     : 3 seconds
> > >  - Global result  : OK
> > >
> > ===========================================================================
> > >
> > > I think it must be something else related to the YAIM configuration -
> > > possibly the users tomcat and glexec.
> > > I will keep looking thanks.
> > >
> > > Dug
> > >
> > > 2009/2/5 Raquel Muņoz <[log in to unmask]>
> > >
> > > > Hi,
> > > >
> > > >
> > > > There is an automatic tool that can be used to check if *some* parts of
> > > > the CREAM CE have been properly configured or if there are some issues
> > to
> > > > fix.
> > > >
> > > > This tool is based on the ig-fry framework.
> > > >
> > > > To use this tool:
> > > >
> > > >   * Install the ig-fry rpm from
> > http://www.pd.infn.it/~sgaravat/ig-fry/<http://www.pd.infn.it/%7Esgaravat/ig-fry/>
> > <http://www.pd.infn.it/%7Esgaravat/ig-fry/>
> > > >
> > > > The Latest version of ig-fry for CREAM is 3.1.2-0
> > > >
> > > >   * Install the rpm
> > > >   * Run it:
> > > >
> > > > fry -v -s <siteinfo.def>
> > > > siteinfo.def> is the one used to configure the CREAM CE via yaim
> > > >
> > > > Please then post the produced output
> > > >
> > > > (Please note that if fry reports that everything is ok, this only means
> > > > that the tests done by fry were successfully passed. This doesn't
> > exclude
> > > > that there are still issues in some other parts of the CE configuration
> > > > not checked by fry)
> > > >
> > > > Regards
> > > >
> > > >
> > > > 2009/2/5 Douglas McNab <[log in to unmask]>:
> > > > > Hi,
> > > > >
> > > > > I have a few questions about setting up a cream ce and wondered if
> > anyone
> > > > > had any ideas about the issues I am seeing.
> > > > > I believe that I have set up the grid service correctly but when I
> > submit
> > > > a
> > > > > job using the simple CLI tools I receive an error message.
> > > > > Unfortunately, I have been through the logs and have not found
> > anything
> > > > that
> > > > > is helping me diagnose the issue.
> > > > > I am sure it probably something very silly I have forgotten to set up
> > but
> > > > > any help would be appreciated.
> > > > >
> > > > > So far I have been following the instructions at:
> > > > >
> > > >
> > http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream31-devel
> > > > >
> > > > > The test job was submitted using:  glite-ce-job-submit -a -d -r
> > > > > dev011.gla.scotgrid.ac.uk:8443/cream-pbs-q30m hellocream.jdl
> > > > > and results in this fatal error:
> > > > >
> > > > > 2009-02-05 14:08:28,217 FATAL - MethodName=[jobRegister]
> > Timestamp=[Thu
> > > > 05
> > > > > Feb 2009 14:08:28] ErrorCode=[0] Description=[system error]
> > > > > FaultCause=[cannot write the job wrapper (jobId = CREAM304767308)!
> > The
> > > > > problem seems to be related to glexec which reported: Broken pipe]
> > > > >
> > > > > investigating the logs I found the stacktrace in the catalina.out:
> > > > >
> > > > > org.glite.ce.creamapi.cmdmanagement.CommandException: cannot write
> > the
> > > > job
> > > > > wrapper (jobId = CREAM810541183)! The problem seems to be related to
> > > > glexec
> > > > > which reported: Broken pipe at
> > > > >
> > > >
> > org.glite.ce.creamapi.jobmanagement.cmdexecutor.AbstractJobExecutor.createJobSandboxDir(AbstractJobExecutor.java:1038)
> > > > >
> > > > > I have seen this page:
> > > > >
> > > >
> > http://grid.pd.infn.it/cream/field.php?n=Main.ErrorMessagesReportedByCREAMToClient
> > > > > which details the error message reported so this lead me to the
> > glexec
> > > > > logs.  I increased the verbosity and debug levels in the glexec file
> > in
> > > > > order to help debug this issue.  However, I failed to find anything
> > of
> > > > any
> > > > > interest in /opt/glite/var/log/glexec_lcas_lcmaps.log
> > > > >
> > > > > it appears that it has mapped my credential correctly and there does
> > not
> > > > > appear to be any errors reported:
> > > > >
> > > > > LCAS   0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): 1
> > > > > *****************************************
> > > > > **
> > > > > LCAS   0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): SIGLEN: 256
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): USER:
> > > > > /C=UK/O=eScience/OU=Glasgow/L=Comps
> > > > > erv/CN=douglas mcnab
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): UCA:
> > > > > /C=UK/O=eScience/OU=Glasgow/L=Comps
> > > > > erv/CN=douglas mcnab
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): SERVER:
> > > > > /C=UK/O=eScience/OU=Glasgow/L=Comps
> > > > > erv/CN=
> > > > [log in to unmask]
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): SCA:
> > > > > /C=UK/O=eScienceCA/OU=Authority/CN=
> > > > > UK e-Science CA
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): VO:     vo.scotgrid.ac.uk
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): URI:
> > > > svr029.gla.scotgrid.ac.uk:15000
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): DATE1:  20090205130408Z
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): DATE2:  20090206010408Z
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): fqan:
> > > > > /vo.scotgrid.ac.uk/Role=NULL/Capabi
> > > > > lity=NULL
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): GROUP:  /vo.scotgrid.ac.uk
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): ROLE:   NULL
> > > > > LCAS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): CAP:    NULL
> > > > > LCAS   0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas_plugin_voms-print_vomsdata(): 1
> > > > > *****************************************
> > > > > **
> > > > >
> > > > > LCAS   0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcas.mod-lcas_run_va(): succeeded
> > > > >
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > Credential Print:
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > dn                    :
> > /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=douglas
> > > > > mcnab
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > uid                   : 220001  [1/1]
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > pgid                  : 220000  [1/1]
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts : VO
> > credential
> > > > > mapping :     [1/1]
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcmaps_printVoMapping(): address of vo mapping struct: 0x92da118
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcmaps_printVoMapping():                    VO string:
> > > > > /vo.scotgrid.ac.uk/Role=NULL/
> > > > > Capability=NULL
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcmaps_printVoMapping():             mapped groupname: scotg
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > lcmaps_printVoMapping():                   mapped GID: 220000
> > > > > LCMAPS 0: 2009-01-05.14:08:27-25329-glexec_get_accounts :
> > > > > pool_index            :
> > > > > %2fc%3duk%2fo%3descience%2fou%3dglasgow%2fl%3dcompserv%2fcn%
> > > > > 3ddouglas%20mcnab:scotg
> > > > >
> > > > > Does anyone know what sort of common errors would occur with glexec?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Dug
> > > > >
> > > >
> > >
> > >
> > >
> > >
> >
> > --
> >               \\\|///
> >            \\ ~ ~ //
> >            (/ @ @ /)
> >   -------oOOo-(_)-oOOo----------------------------------
> >                         Massimo Sgaravatto
> >                         INFN Sezione di Padova
> >                         Via Marzolo, 8
> >                         35131 Padova - Italy
> >                         Tel: ++39 0498277047   Fax: ++39 0498277102
> >          oooO           E-mail: massimo.sgaravatto [at] pd.infn.it
> >          (   )   Oooo   Home page: http://www.pd.infn.it/~sgaravat<http://www.pd.infn.it/%7Esgaravat>
> >   --------\ (----(   )----------------------------------
> >            \_)    ) /
> >                  (_/
> >
> 
> 
> 
> 

-- 
              \\\|///
            \\ ~ ~ //
            (/ @ @ /)
   -------oOOo-(_)-oOOo----------------------------------
                         Massimo Sgaravatto
                         INFN Sezione di Padova
                         Via Marzolo, 8
                         35131 Padova - Italy  
                         Tel: ++39 0498277047   Fax: ++39 0498277102
          oooO           E-mail: massimo.sgaravatto [at] pd.infn.it
          (   )   Oooo   Home page: http://www.pd.infn.it/~sgaravat
   --------\ (----(   )----------------------------------
            \_)    ) /
                  (_/