Hi Torsten...
I'm the GE_utils yaim guy. I'm just seeing this thread now.
Were you using YAIM to setup your GE client? Or were you using NFS to
share your GE installation?
Cheers
Goncalo
On 10/26/2012 12:22 PM, Torsten Harenberg wrote:
> Hi Maarten,
> hi list,
>
> I just want to conclude on this thread as I think I have found the source of the problem.
>
> I wrote a workaround that enhances the PATH, so that jobs will succeed in any case, which gives me much more debugging possibilities. And I captured wrapper scripts and job output throughout the night and now I think the problem is completely "SGE related".
>
> Digging into the proc entry of one sge_shepherd:
>
> 23902 ? S 0:00 sge_shepherd-1346114 -bg
> 24077 ? SNs 0:00 -bash /sge-root/default/spool/wn160/job_scripts/1346114
>
> [root@wn160 23902]# cat environ
> MANPATH=/opt/edg/share/man:/opt/glite/share/man:/opt/glite/yaim/man:/opt/globus/man:/opt/lcg/man:/opt/lcg/share/man::::::LC_MONETARY=de_DE.utf-8HOSTNAME=wn160SHELL=/bin/bashTERM=xtermGRID_ENV_LOCATION=/opt/glite/etc/profile.dHISTSIZE=1000SSH_CLIENT=132.195.125.4 33861 22GLOBUS_LOCATION=/opt/globusPERL5LIB=/opt/lcg/lib64/perl:/opt/gpt/lib/perlVO_OPS_DEFAULT_SE=grid-se.physik.uni-wuppertal.deSGE_CELL=defaultGT_PROXY_MODE=oldLC_NUMERIC=de_DE.utf-8SSH_TTY=/dev/pts/0USER=rootLS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:LD_LIBRARY_PATH=/opt/d-cache/dcap/lib:/opt/d-cache/dcap/lib64:/opt/glite/lib:/opt/glite/lib64:/opt/globus/lib:/opt/lcg/lib:/opt/lcg/lib64:/opt/classads/lib64/:/opt/c-ares/lib/VO_GHEP_SW_DIR=/gridsoft/ghepVO_DTEAM_SW_DIR=/gridsoft/dteamLCG_LOCATION=/opt/lcgATLAS_LOCAL_AREA=/gridsoft/atlas-cvmfs/localVO_OPS_SW_DIR=/gridsoft/opsVO_ATLAS_DEFAULT_SE=grid-se.physik.uni-wuppertal.dePATH=/bin:/usr/bin:/sbin:/usr/sbinMAIL=/var/spool/mail/rootLC_MESSAGES=de_DE.utf-8LC_COLLATE=de_DE.utf-8VO_DTEAM_DEFAULT_SE=grid-se.physik.uni-wuppertal.deEDG_LOCATION=/opt/edgPWD=/rootINPUTRC=/etc/inputrcVO_AUGER_DEFAULT_SE=grid-se.physik.uni-wuppertal.deSITE_GIIS_URL=grid-bdii.physik.uni-wuppertal.deLANG=de_DE.utf-8VO_DECH_DEFAULT_SE=scaise-2.scai.fraunhofer.deSGE_ROOT=/sge-rootMYPROXY_SERVER=grid-px0.desy.deHOME=/rootSHLVL=2GLITE_LOCATION_VAR=/opt/glite/varVO_AUGER_SW_DIR=/gridsoft/augerGLITE_ENV_SET=TRUELOGNAME=rootPYTHONPATH=/opt/glite/lib64/python2.4/site-packages:/opt/glite/lib/python:/opt/lcg/lib64/python2.4/site-packages:/opt/lcg/lib64/pythonLCG_GFAL_INFOSYS=bdii-fzk.gridka.de:2170LC_CTYPE=de_DE.utf-8SSH_CONNECTION=132.195.125.4 33861 132.195.125.170 22VO_GHEP_DEFAULT_SE=grid-se.physik.uni-wuppertal.deLESSOPEN=|/usr/bin/lesspipe.sh %sVO_ATLAS_SW_DIR=/cvmfs/atlas.cern.ch/repo/swVO_ICECUBE_SW_DIR=/gridsoft/icecubeGLITE_LOCATION=/opt/gliteLC_TIME=de_DE.utf-8VO_DECH_SW_DIR=/gridsoft/dechSITE_NAME=wuppertalprodG_BROKEN_FILENAMES=1SRM_PATH=/opt/d-cache/srmVO_ICECUBE_DEFAULT_SE=grid-se.physik.uni-wuppertal.de_=/sge-root/bin/lx24-amd64/sge_execd
>
> You see all the stuff from /etc/profile.d/grid-env.sh
>
> Here's a node which doesn't have the problem:
>
> 30232 ? S 0:00 sge_shepherd-1329961 -bg
>
> [root@wn158 30232]# cat environ
> SELINUX_INIT=YESCONSOLE=/dev/consoleTERM=linuxSGE_CELL=defaultINIT_VERSION=sysvinit-2.86PATH=/bin:/usr/bin:/sbin:/usr/sbinRUNLEVEL=3runlevel=3PWD=/LANG=en_US.UTF-8SGE_ROOT=/sge-rootPREVLEVEL=Nprevious=NHOME=/SHLVL=2_=/sge-root/b
>
> You see: no grid related stuff, especially no GLITE_ENV_SET.
>
> At the top of /etc/profile.d/grid-env.sh we have a
>
> if [ "X${GLITE_ENV_SET+X}" = "X" ]; then
> . /opt/glite/etc/profile.d/grid-env-funcs.sh
>
> So if GLITE_ENV_SET is already set, the script will not define gridpath_prepend which would be later used to set the PATH correctly:
>
> gridpath_prepend "PATH" "/opt/lcg/bin"
> gridpath_prepend "PATH" "/opt/globus/bin"
> gridpath_prepend "PATH" "/opt/glite/bin"
> gridpath_prepend "PATH" "/opt/edg/bin"
> gridpath_prepend "PATH" "/opt/d-cache/srm/bin:/opt/d-cache/dcap/bin"
>
> So PATH will be left to PATH=/bin:/usr/bin:/sbin:/usr/sbin plus whatever some other script will add.
>
> That means: if SGE is started by init, you will have no /etc/profile.d/grid-env.sh sourced before and everything is okay. If you need to re-start sge_execd later for whatever reason, you will end up with an "all-but-PATH" environment.
>
> I will now add another layer on tup of /etc/profile.d/grid-env.sh which prevent this from being executed when called as root.
>
> I hope this is helpful for any other SGE side, too.
>
> Best regards and have a nice weekend
>
> Torsten
>
>
> --
> <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
> <> <>
> <> Dr. Torsten Harenberg [log in to unmask] <>
> <> Bergische Universitaet <>
> <> FB C - Physik Tel.: +49 (0)202 439-3521 <>
> <> Gaussstr. 20 Fax : +49 (0)202 439-2811 <>
> <> 42097 Wuppertal <>
> <> <>
> <><><><><><><>< Of course it runs NetBSD http://www.netbsd.org ><>
|