I checked these ports at each of the RBs:
9000 9001
CERN(0380) open open
CERN(0381) closed closed
ICFA open open
IFIC open open
KFKI closed closed
NIKHEF open open
PIC open open
RAL open open
SINP open open
SINICA closed closed
The test of the RAL RB may not reflect the external view since the tests
were made from inside the RAL firewall.
Trevor
.lf n25
Dr Trevor Daniels
c/o CCLRC eSC Department Phone: (+44)|(0) 1235 778093
Rutherford Appleton Laboratory Fax: (+44)|(0) 1235 446626
Chilton, DIDCOT, Oxon, OX11 0QX, UK Email: [log in to unmask]
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended recipients do
not take action on it or show it to anyone else, but return this email to
the sender and delete your copy of it.
> -----Original Message-----
> From: Bly, MJ (Martin) [mailto:[log in to unmask]]
> Sent: Tuesday, December 09, 2003 9:31 AM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Globus error 3
>
>
> We're on to it...
>
> RB is currently unhappy too.
>
> M.
> --
> -------------------------------------------------------
> Martin Bly | +44 1235 446981 | [log in to unmask]
> Systems Admin, Tier 1/A Service, RAL PPD CSG
> -------------------------------------------------------
>
> > -----Original Message-----
> > From: Gonzalo Merino [mailto:[log in to unmask]]
> > Sent: Tuesday, December 09, 2003 9:24 AM
> > To: [log in to unmask]
> > Subject: Re: [LCG-ROLLOUT] Globus error 3
> >
> >
> > Hello,
> >
> > I have been asking people from the EDG WP1 about this behaviour and
> > apparently this is due to a memory-leaking bug in
> > edg-wl-interlogd. This
> > problem is still not fixed in the current rpms, they are
> > working on it.
> >
> > So, there is indeed a problem in the code that needs to be solved.
> > However, it seems that there is also a configuration
> problem in LCG-1
> > that has amplified the effect of the bug. This would not have
> > shown up
> > that much without edg-wl-interlogd in the CEs beeing unable
> > to contact
> > the bookkeeping server in lcgrb01.gridpp.rl.ac.uk, port
> 9001 (9000 is
> > default bookkeeping server's port for queries, 9001 for event
> > reception). This could point to a firewall setup problem at RAL.
> >
> > We have observed this "inflating edg-wl-interlogd" problem
> in our CE
> > (grid-w1.ifae.es), and it turns out that there are lots of
> log files
> > /var/tmp/dg20logd_.* in this machine all of them pointing to
> > undelivered
> > bookeeping information back to lcgrb01.gridpp.rl.ac.uk.
> >
> > Could the system administrator at RAL check the firewall
> settings for
> > accessing port 9001 on the RB machine?
> >
> > cheers,
> > Gonzalo
> >
> >
> > Francisco Javier Rodriguez Calonge wrote:
> > > Jiri Kosina wrote:
> > >
> > >> Hello,
> > >>
> > >> Time to time we ecounter problems with submitting job to
> our farm,
> > >> edg-job-status reports
> > >>
> > >> *************************************************************
> > >> BOOKKEEPING INFORMATION:
> > >>
> > >> Printing status info for the Job :
> > >> https://lxshare0380.cern.ch:9000/scW9jsIq8INJjBeOaPVgLA
> > >> Current Status: Done (Cancelled)
> > >> Exit code: 0
> > >> Status Reason: Got a job held event, reason: Globus
> > error 3: an I/O
> > >> operation failed
> > >> Destination:
> > golias25.farm.particle.cz:2119/jobmanager-lcgpbs-short
> > >> reached on: Thu Nov 27 15:53:27 2003
> > >> *************************************************************
> > >>
> > >> I have tried restarting pbs, mds and gatekeeper, but the
> > problem persits.
> > >> The only solution I've found to be working is reboot of CE.
> > >>
> > >> Did anyone ever met this problem? Is there anything I
> > should verify?
> > >> Thanks.
> > >>
> > >> --
> > >> Jiri Kosina
> > >> Institute of physics, Academy of Sciences of the Czech Republic
> > >>
> > >>
> > >>
> > > Hi Jiri,
> > >
> > > we have noticed that problem here in CIEMAT and you can
> find out it
> > > reported in the rollout archives (just search for "Globus
> > error 3" in
> > > http://www.listserv.rl.ac.uk/cgi-bin/wa.exe?S1=lcg-rollout).
> > > It is related with /opt/edg/sbin/edg-wl-interlogd process.
> > This process
> > > exhaust all memory avilable in CE. Under 2% it's not
> > possible to submit
> > > any job. The only solution we konw is to restart the daemon
> > > edg-wl-locallogger ( we have put a cron task looking at
> > free memory and
> > > restarting this daemon when it lies under 10% or so).
> > >
> > > Cheers, Javier
> > >
> > > --
> > > F.Javier Rodriguez Calonge mailto:[log in to unmask]
> > > Tfno: +34 91 346 60 00 Ext: 68 02
> >
> > --
> > Gonzalo Merino ([log in to unmask])
> > Institut de Física d'Altes Energies (UAB)
> > 08193 Bellaterra (Barcelona) SPAIN
> > Tel: +34 93 5813322 / Fax: +34 93 5814110
> >
>
|