Emanuele,
The RAL firewall should now be open on 9001/2 (as well as 9000) for the RB
(only).
Martin.
--
-------------------------------------------------------
Martin Bly | +44 1235 446981 | [log in to unmask]
Systems Admin, Tier 1/A Service, RAL PPD CSG
-------------------------------------------------------
> -----Original Message-----
> From: Bly, MJ (Martin)
> Sent: Tuesday, December 09, 2003 3:43 PM
> To: 'LHC Computer Grid - Rollout'
> Subject: RE: [LCG-ROLLOUT] Globus error 3
>
>
> Emanuele,
>
> I've checked our firewall config and it appears port 9001
> (and 9002) are blocked - this must have happened recently -
> probably when the new firewall was installed and the rulesets
> were transfered.
>
> I have asked for an urgent change to the rulesets to open
> 9001/2 inbound. I'll ley you knwo when I get notification
> it's been done.
>
> Martin.
> --
> -------------------------------------------------------
> Martin Bly | +44 1235 446981 | [log in to unmask]
> Systems Admin, Tier 1/A Service, RAL PPD CSG
> -------------------------------------------------------
>
> > -----Original Message-----
> > From: Emanuele LEONARDI [mailto:[log in to unmask]]
> > Sent: Tuesday, December 09, 2003 9:53 AM
> > To: [log in to unmask]
> > Subject: Re: [LCG-ROLLOUT] Globus error 3
> >
> >
> > Hi Trevor.
> >
> > From CERN I see:
> >
> > (leonardi@it-adc-pc02) ~/grid/recipes> telnet
> lxshare0380.cern.ch 9001
> > Trying 137.138.145.208...
> > Connected to lxshare0380.cern.ch.
> > Escape character is '^]'.
> > ^]
> > telnet> quit
> > Connection closed.
> >
> > (leonardi@it-adc-pc02) ~/grid/recipes> telnet
> > gtbcg16.ifca.unican.es 9001
> > Trying 193.144.209.116...
> > Connected to gtbcg16.ifca.unican.es.
> > Escape character is '^]'.
> > ^]
> > telnet> quit
> > Connection closed.
> >
> > (leonardi@it-adc-pc02) ~/grid/recipes> telnet
> > lcgrb01.gridpp.rl.ac.uk 9001
> > Trying 130.246.183.184...
> >
> > i.e. port 9001 is accessible on CERN and IFCA RBs but not on
> > RAL RB (did
> > not test the others). Same thing for port 9002.
> >
> > As the same test done inside RAL works, this looks really like a
> > firewall problem...
> >
> > Emanuele
> >
> > Daniels, T (Trevor) wrote:
> > > I checked these ports at each of the RBs:
> > >
> > > 9000 9001
> > >
> > > CERN(0380) open open
> > > CERN(0381) closed closed
> > > ICFA open open
> > > IFIC open open
> > > KFKI closed closed
> > > NIKHEF open open
> > > PIC open open
> > > RAL open open
> > > SINP open open
> > > SINICA closed closed
> > >
> > > The test of the RAL RB may not reflect the external view
> > since the tests
> > > were made from inside the RAL firewall.
> > >
> > > Trevor
> > > .lf n25
> > >
> > > Dr Trevor Daniels
> > > c/o CCLRC eSC Department Phone: (+44)|(0)
> 1235 778093
> > > Rutherford Appleton Laboratory Fax: (+44)|(0)
> 1235 446626
> > > Chilton, DIDCOT, Oxon, OX11 0QX, UK Email:
> [log in to unmask]
> > > The contents of this email are sent in confidence for the
> use of the
> > > intended recipient only. If you are not one of the
> > intended recipients do
> > > not take action on it or show it to anyone else, but return
> > this email to
> > > the sender and delete your copy of it.
> > >
> > >
> > >
> > >>-----Original Message-----
> > >>From: Bly, MJ (Martin) [mailto:[log in to unmask]]
> > >>Sent: Tuesday, December 09, 2003 9:31 AM
> > >>To: [log in to unmask]
> > >>Subject: Re: [LCG-ROLLOUT] Globus error 3
> > >>
> > >>
> > >>We're on to it...
> > >>
> > >>RB is currently unhappy too.
> > >>
> > >>M.
> > >>--
> > >> -------------------------------------------------------
> > >> Martin Bly | +44 1235 446981 | [log in to unmask]
> > >> Systems Admin, Tier 1/A Service, RAL PPD CSG
> > >> -------------------------------------------------------
> > >>
> > >>
> > >>>-----Original Message-----
> > >>>From: Gonzalo Merino [mailto:[log in to unmask]]
> > >>>Sent: Tuesday, December 09, 2003 9:24 AM
> > >>>To: [log in to unmask]
> > >>>Subject: Re: [LCG-ROLLOUT] Globus error 3
> > >>>
> > >>>
> > >>>Hello,
> > >>>
> > >>>I have been asking people from the EDG WP1 about this
> behaviour and
> > >>>apparently this is due to a memory-leaking bug in
> > >>>edg-wl-interlogd. This
> > >>>problem is still not fixed in the current rpms, they are
> > >>>working on it.
> > >>>
> > >>>So, there is indeed a problem in the code that needs to
> be solved.
> > >>>However, it seems that there is also a configuration
> > >>
> > >>problem in LCG-1
> > >>
> > >>>that has amplified the effect of the bug. This would not have
> > >>>shown up
> > >>>that much without edg-wl-interlogd in the CEs beeing unable
> > >>>to contact
> > >>>the bookkeeping server in lcgrb01.gridpp.rl.ac.uk, port
> > >>
> > >>9001 (9000 is
> > >>
> > >>>default bookkeeping server's port for queries, 9001 for event
> > >>>reception). This could point to a firewall setup problem at RAL.
> > >>>
> > >>>We have observed this "inflating edg-wl-interlogd" problem
> > >>
> > >>in our CE
> > >>
> > >>>(grid-w1.ifae.es), and it turns out that there are lots of
> > >>
> > >>log files
> > >>
> > >>>/var/tmp/dg20logd_.* in this machine all of them pointing to
> > >>>undelivered
> > >>>bookeeping information back to lcgrb01.gridpp.rl.ac.uk.
> > >>>
> > >>>Could the system administrator at RAL check the firewall
> > >>
> > >>settings for
> > >>
> > >>>accessing port 9001 on the RB machine?
> > >>>
> > >>>cheers,
> > >>>Gonzalo
> > >>>
> > >>>
> > >>>Francisco Javier Rodriguez Calonge wrote:
> > >>>
> > >>>>Jiri Kosina wrote:
> > >>>>
> > >>>>
> > >>>>>Hello,
> > >>>>>
> > >>>>>Time to time we ecounter problems with submitting job to
> > >>>>
> > >>our farm,
> > >>
> > >>>>>edg-job-status reports
> > >>>>>
> > >>>>>*************************************************************
> > >>>>>BOOKKEEPING INFORMATION:
> > >>>>>
> > >>>>>Printing status info for the Job :
> > >>>>>https://lxshare0380.cern.ch:9000/scW9jsIq8INJjBeOaPVgLA
> > >>>>>Current Status: Done (Cancelled)
> > >>>>>Exit code: 0
> > >>>>>Status Reason: Got a job held event, reason: Globus
> > >>>>
> > >>>error 3: an I/O
> > >>>
> > >>>>>operation failed
> > >>>>>Destination:
> > >>>>
> > >>>golias25.farm.particle.cz:2119/jobmanager-lcgpbs-short
> > >>>
> > >>>>>reached on: Thu Nov 27 15:53:27 2003
> > >>>>>*************************************************************
> > >>>>>
> > >>>>>I have tried restarting pbs, mds and gatekeeper, but the
> > >>>>
> > >>>problem persits.
> > >>>
> > >>>>>The only solution I've found to be working is reboot of CE.
> > >>>>>
> > >>>>>Did anyone ever met this problem? Is there anything I
> > >>>>
> > >>>should verify?
> > >>>
> > >>>>>Thanks.
> > >>>>>
> > >>>>>--
> > >>>>>Jiri Kosina
> > >>>>>Institute of physics, Academy of Sciences of the Czech Republic
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>>Hi Jiri,
> > >>>>
> > >>>>we have noticed that problem here in CIEMAT and you can
> > >>>
> > >>find out it
> > >>
> > >>>>reported in the rollout archives (just search for "Globus
> > >>>
> > >>>error 3" in
> > >>>
> > >>>>http://www.listserv.rl.ac.uk/cgi-bin/wa.exe?S1=lcg-rollout).
> > >>>>It is related with /opt/edg/sbin/edg-wl-interlogd process.
> > >>>
> > >>>This process
> > >>>
> > >>>>exhaust all memory avilable in CE. Under 2% it's not
> > >>>
> > >>>possible to submit
> > >>>
> > >>>>any job. The only solution we konw is to restart the daemon
> > >>>>edg-wl-locallogger ( we have put a cron task looking at
> > >>>
> > >>>free memory and
> > >>>
> > >>>>restarting this daemon when it lies under 10% or so).
> > >>>>
> > >>>>Cheers, Javier
> > >>>>
> > >>>>--
> > >>>>F.Javier Rodriguez Calonge mailto:[log in to unmask]
> > >>>>Tfno: +34 91 346 60 00 Ext: 68 02
> > >>>
> > >>>--
> > >>>Gonzalo Merino ([log in to unmask])
> > >>>Institut de Física d'Altes Energies (UAB)
> > >>>08193 Bellaterra (Barcelona) SPAIN
> > >>>Tel: +34 93 5813322 / Fax: +34 93 5814110
> > >>>
> > >>
> >
> >
> > --
> > /------------------- Emanuele Leonardi -------------------\
> > | eMail: [log in to unmask] - Tel.: +41-22-7674066 |
> > | IT division - Bat.31 2-012 - CERN - CH-1211 Geneva 23 |
> > \---------------------------------------------------------/
> >
>
|