Hi Graham,
OK, thanks.
One more question. How are the jobs downloading their input files from the SE.
Looking at our ganglia network plots ( http://ganglia.pp.rl.ac.uk/?r=hour&s=by%2520name )
The traffic out of the Atlas pool nodes is much higher than the traffic going into the worker nodes (and the same structure appears at a lower level in the outbound traffic on the CMS and Misc pool nodes). That suggests to me that however your getting the files out of the SE is not connecting to the pool with the files and causing an extra network hop.
If you look at the network plot for the CMS transfers the in and outbound rates are more or less independent.
Yours,
Chris.
> -----Original Message-----
> From: Testbed Support for GridPP member institutes [mailto:TB-
> [log in to unmask]] On Behalf Of Graeme Stewart
> Sent: 21 July 2009 13:35
> To: [log in to unmask]
> Subject: Re: Next week's HammerCloud test: Birmingham
>
> Panda analysis uses a build job to compile the user code first, then
> this is staged from the SE for each of the individual sub-jobs. If the
> build job fails (upstream) then the analysis jobs (downstream) are
> failed automatically.
>
> Having a look at
>
> http://panda.cern.ch:25980/server/pandamon/query?job=1016490520
>
> the build job failed because of the network issues, so if this is now
> out of the equation then we should be ok.
>
> Cheers
>
> Graeme
>
> On Tue, Jul 21, 2009 at 14:26, Brew, CAJ (Chris)<[log in to unmask]>
> wrote:
> > Hi Graham,
> >
> > Yes, there was a firewall problem on some nodes we added recently,
> I've removed these from production so things should improve.
> >
> > And we're seeing jobs from you and Peter mapped to atlas pilot users
> correctly now.
> >
> > However on test 520 everything seems to have failed at before being
> submitted to us:
> >
> > ANALY_RALPP 299 killed by Panda server : upstream job fa...
> >
> > Any ideas why?
> >
> > Thanks,
> > Chris.
> >
> >> -----Original Message-----
> >> From: Testbed Support for GridPP member institutes [mailto:TB-
> >> [log in to unmask]] On Behalf Of Graeme Stewart
> >> Sent: 21 July 2009 12:34
> >> To: [log in to unmask]
> >> Subject: Re: Next week's HammerCloud test: Birmingham
> >>
> >> Hi Rob
> >>
> >> You are running jobs, but many failures with this error:
> >>
> >> Error details: pilot: wget command failed: 256, --12:10:00--
> >> http://www.usatlas.bnl.gov/svn/panda/pathena/trf/runAthena-00-00-11
> >
> >> `runAthena-00-00-11' Resolving www.usatlas.bnl.gov... 130.199.54.3
> >> Connecting to www.usatlas.bnl.gov|130.199.54.3|:80... failed
> >>
> >> See http://panda.cern.ch:25980/server/pandamon/query?dash=analysis,
> >> open the UK tab and then click though to errors.
> >>
> >> Outbound access problems on heplnc machines?
> >>
> >>
> http://panda.cern.ch:25980/server/pandamon/query?overview=wnlist&type=a
> >> nalysis&hours=24&site=ANALY_RALPP&reload=yes
> >>
> >> On the other point, you should certainly see pilots with my DN,
> >>
> >> "/C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=graeme stewart"
> >>
> >> and perhaps ones from Peter Love.
> >>
> >> Cheers
> >>
> >> Graeme
> >>
> >> On Tue, Jul 21, 2009 at 12:56, Harper, RM
> (Rob)<[log in to unmask]>
> >> wrote:
> >> > Hi Sam,
> >> >
> >> > We're not seeing any jobs at RALPP though the web pages say we
> have a
> >> load queued.
> >> >
> >> > What DNs are the jobs being submitted as? We'd like to be able to
> >> check if we are getting any connections coming in...
> >> >
> >> > Cheers,
> >> > Rob
> >> >
> >> >> -----Original Message-----
> >> >> From: Testbed Support for GridPP member institutes
> >> >> [mailto:[log in to unmask]] On Behalf Of Sam Skipsey
> >> >> Sent: Tuesday, July 21, 2009 11:35 AM
> >> >> To: [log in to unmask]
> >> >> Subject: Re: Next week's HammerCloud test: Birmingham
> >> >>
> >> >> It means that 350 jobs have been submitted. Don't worry about
> >> >> running out of jobs to process - Glasgow has around 1000 jobs
> >> >> queued from the pilot factory, and are having no issues with
> >> >> the stream of job we're getting.
> >> >> The pilot jobs issues are probably related to RHUL's issues -
> >> >> the pilot factories are being looked at, and there should be
> >> >> pilots for you soon.
> >> >>
> >> >> Sam
> >> >>
> >> >> 2009/7/21 Chris Curtis <[log in to unmask]>:
> >> >> > Hi -
> >> >> >
> >> >> > I've noticed on the hammercloud test site that Birmingham
> >> >> is down for
> >> >> > ~350 ANALY_BHAM jobs per test. Does this figure mean that 350
> >> have
> >> >> > already been submitted to Panda destined for Birmingham, or is
> >> that
> >> >> > the total number of jobs to be submitted over the two days?
> >> >> >
> >> >> > Either way I can't see any ATLAS pilot jobs here at
> >> >> Birmingham, and we
> >> >> > have a relatively quiet cluster at the moment...
> >> >> >
> >> >> > Cheers,
> >> >> >
> >> >> > Chris
> >> >> >
> >> >> > --
> >> >> > West 326
> >> >> > Physics and Astronomy
> >> >> > University of Birmingham
> >> >> > Edgbaston
> >> >> > Birmingham
> >> >> > B15 2TT
> >> >> >
> >> >> > (Office) 0121 414 4700
> >> >> > (Mobile) 0798 666 1959
> >> >> >
> >> >>
> >> > --
> >> > Scanned by iCritical.
> >> >
> >>
> >>
> >>
> >> --
> >> Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
> >> Department of Physics and Astronomy, University of Glasgow, Scotland
> >> DEATH TO MEETINGS!
> > --
> > Scanned by iCritical.
> >
>
>
>
> --
> Dr Graeme Stewart http://www.physics.gla.ac.uk/~graeme/
> Department of Physics and Astronomy, University of Glasgow, Scotland
> DEATH TO MEETINGS!
--
Scanned by iCritical.
|