Enamuele,
Looks as if I was a tad hasty - the RB seems to be employing the port range
required (50000 - 52000 in our case).
Nevertheless we still have an RB which causes jobs to enter the state:
*************************************************************
BOOKKEEPING INFORMATION:
Printing status info for the Job :
https://lcgrb01.gridpp.rl.ac.uk:9000/Txb_74TFC3g1mLkoy2Mj-A
Current Status: Ready
Status Reason: unavailable
Destination: lcgce01.gridpp.rl.ac.uk:2119/jobmanager-lcgpbs-short
reached on: Wed Dec 17 09:56:34 2003
*************************************************************
and remain there forever.
Martin.
--
-------------------------------------------------------
Martin Bly | +44 1235 446981 | [log in to unmask]
Systems Admin, Tier 1/A Service, RAL PPD CSG
-------------------------------------------------------
> -----Original Message-----
> From: Emanuele LEONARDI [mailto:[log in to unmask]]
> Sent: Wednesday, December 17, 2003 10:38 AM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] RAL RB woes - reinstall
>
>
> Hi Martin.
>
> In principle the TCP_PORT_RANGE problem should have been fixed since
> LCG1-1_1_0.
>
> Can you please check if your /etc/sysconfig/globus file looks
> like this:
>
> [root@lxshare0380 init.d]# cat /etc/sysconfig/globus
> GLOBUS_LOCATION=/opt/globus
> GLOBUS_CONFIG=/etc/globus.conf
> GLOBUS_TCP_PORT_RANGE="20000 25000"
>
> If this is the case, please use ps to find the PID of
> /opt/condor/sbin/gahp_server and look into
> /proc/<PID>/environ settings.
> At CERN I have:
>
> [root@lxshare0380 init.d]# cat -v /proc/3523/environ
> PWD=/tmp^@GLOBUS_TCP_PORT_RANGE=20000
> 25000^@GRIDMAPDIR=/etc/grid-security/gridmapdir/^@SHLIB_PATH=/
opt/globus/lib^@SASL_PATH=/opt/globus/lib/sasl^@EDG_LOCATION_VAR=/opt/edg/va
[log in to unmask]@LD>
_LIBRARY_PATH=/opt/globus/lib:/opt/globus/lib:/opt/edg/lib:/op
> t/globus/lib:/opt/edg/lib^@EDG_WL_LOCATION_VAR=/opt/edg/var^@G
> LOBUS_LOCATION=/opt/globus^@EDG_WL_CONFIG_DIR=/etc^@CONDOR_CON
> FIG=/opt/condor/etc/condor.conf^@GPT_LOCATION=/opt/gpt^@LESSOP
> EN=|/usr/bin/lesspipe.sh
> [log in to unmask]@EDG_WL_USER=edg
user^@XERCESJ_INSTALL_PATH=/usr^@EDG_WL_TMP=/tmp^@USER=root^@LS_COLORS=no=00
:fi=00:di=01;34:ln=01;36:pi=>
40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05
> ;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;3
> 2:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*
> .arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=0
> 1;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:
> *.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*
> .xpm=01;35:*.png=01;35:*.tif=01;35:^@MAIL=/var/spool/mail/root
^@INPUTRC=/etc/inputrc^@[log in to unmask]@LIBPATH=/opt/g
lobus/lib:/usr/lib:/lib^@SSH_CLIENT> =137.138.32.180
> 34750
> 22^@EDG_TMP=/tmp^@GLOBUS_PATH=/opt/globus^@LOG4J_INSTALL_PATH=
> /usr^@LOGNAME=root^@[log in to unmask]@EDG_LOCATION=
> /opt/edg^@GRIDMAP=/etc/grid-security/grid-mapfile^@SHELL=/bin/
> bash^@USERNAME=root^@HISTSIZE=1000^@HOME=/root^@TERM=vt100^@CO
[log in to unmask]:9
002> ^@PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin:/opt/edg/b
> in:/opt/edg/sbin^@SSH_TTY=/dev/pts/0^@EDG_WL_LOCATION=/opt/edg
> ^@_=/sbin/initlog^@M-^@[log in to unmask]@^
> @CONDOR_INHERIT=3522
> <137.138.145.208:53287> 0 0^@
>
> which shows that the settings in /etc/sysconfig/globus are
> actually used
> by condorG, the only process using the port range.
>
> If both these tests at RAL are consistent with what I see at CERN, can
> you please tell me what you do to verify that the
> GLOBUS_TCP_PORT_RANGE
> setting is not used on your RB?
>
> Thanks, ciao
>
> Emanuele
>
> Bly, MJ (Martin) wrote:
> > We completed the reinstall as per instructions,
> reconfigured the database,
> > rebooted and stood back to watch.
> >
> > No luck with jobs submitted to the RAL CE or the CERN CE
> via the RAL RB.
> >
> > Discovered the RB *still* takes absolutely no notice of the
> > GLOBUS_TCP_PORT_RANGE setting in its config, so set that
> the hard way.
> >
> > Still no luck with the RAL RB. There appears to be
> nothing in the log
> > (/var/log/messages) to indicate anyting awry.
> >
> > What's missing? I follow the instructions and *IT DOESN'T WORK".
> >
> > Martin.
> > --
> > -------------------------------------------------------
> > Martin Bly | +44 1235 446981 | [log in to unmask]
> > Systems Admin, Tier 1/A Service, RAL PPD CSG
> > -------------------------------------------------------
> >
> >
> >>-----Original Message-----
> >>From: Martin Bly [mailto:[log in to unmask]]
> >>Sent: Tuesday, December 16, 2003 9:49 AM
> >>To: [log in to unmask]
> >>Cc: Martin Bly
> >>Subject: RAL RB woes - reinstall
> >>
> >>
> >>The RAL RB is now so unreliable that we have decided that it
> >>is time to
> >>reboot it with the kickstart floppy.
> >>
> >>Therefore, the RAL RB will be out of commission today
> >>starting around 10am
> >>UK time, until the reinstallation is complete. Shouldn't
> >>take more than an
> >>hour, if everything goes to plan.
> >>
> >>If anyone has any last minute ideas on how to stabilise it,
> >>you've got 10
> >>minutes to let me know...
> >>
> >>Martin.
> >>
> >
>
>
> --
> /------------------- Emanuele Leonardi -------------------\
> | eMail: [log in to unmask] - Tel.: +41-22-7674066 |
> | IT division - Bat.31 2-012 - CERN - CH-1211 Geneva 23 |
> \---------------------------------------------------------/
>
|