Hi Sam,
We're SL6 at Lancaster still (and only on 1.9.0. - upgrading's on my
todo list).
Cheers,
Matt
On 10/05/18 14:23, Sam Skipsey wrote:
> Sneaking suspicion: which of you guys have IPv6 turned on your storage?
>
> I think Lancaster's also Centos 7 / DPM 1.9.x (Matt, am I remembering
> right?), but Matt did some Exciting Things to fix odd IPv6 problems, as
> I recall.
>
> On Thu, May 10, 2018 at 2:17 PM Sam Skipsey <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
> Okay, so everyone with an issue with a ticket is on Centos 7 and DPM
> 1.9.x... (this is a head node issue, so that's the important bit).
>
> I'll just check the sites I know aren't SL7/Centos 7 in the
> monitoring and see if they are different.
>
> Sam
>
> On Thu, May 10, 2018 at 11:46 AM John Bland <[log in to unmask]
> <mailto:[log in to unmask]>> wrote:
>
> At Liverpool all Centos7.4, DPM 1.9.2, puppet.
>
> On 10/05/2018 11:37, Govind Songara wrote:
> > Thanks Simon, headnode is configured using puppet. Pool node
> still uses
> > yaim.
> >
> > On Thu, 10 May 2018, 11:19 a.m. George, Simon,
> <[log in to unmask] <mailto:[log in to unmask]>
> > <mailto:[log in to unmask] <mailto:[log in to unmask]>>> wrote:
> >
> > Hi Sam,
> >
> > RHUL is running DPM 1.9.0 on Centos 7.3 on the SE head node.
> >
> > The storage nodes are DPM 1.8.10 on SL6.9.
> >
> > Simon
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------
> > *From:* Sam Skipsey <[log in to unmask]
> <mailto:[log in to unmask]>
> > <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>>
> > *Sent:* 10 May 2018 11:12
> > *To:* George, Simon
> > *Cc:* [log in to unmask]
> <mailto:[log in to unmask]>
> > <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > *Subject:* Re: [GRIDPP-STORAGE] help debugging transfer
> failures
> > Hello:
> >
> > So, it looks like Oxford and RHUL and the new ECDF-RDF have
> > something in common, as all of your transfer failures
> look similar
> > from the ATLAS logs (they look like SOAP errors on PUT
> DONE (error
> > code 500), on otherwise successful transfers).
> >
> > I know Oxford is running on SL7 with DPM 1.9.2 - is there
> anything
> > in common with the other two of you?
> >
> > Sam
> >
> > On Sun, May 6, 2018 at 12:33 PM George, Simon
> <[log in to unmask] <mailto:[log in to unmask]>
> > <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>> wrote:
> >
> > We got a new ticket for the same problem this weekend:
> >
> > https://ggus.eu/index.php?mode=ticket_info&ticket_id=134945
> >
> > How can we move forward on this?
> >
> > Change FTS parameters - how?
> >
> >
> > Thanks,
> >
> > Simon
> >
> >
> >
> >
> ------------------------------------------------------------------------
> > *From:* GRIDPP2: Deployment and support of SRM and
> local storage
> > management <[log in to unmask]
> <mailto:[log in to unmask]>
> > <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>> on behalf of John Bland
> > <[log in to unmask]
> <mailto:[log in to unmask]> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>>
> > *Sent:* 03 May 2018 10:52
> > *To:* [log in to unmask]
> <mailto:[log in to unmask]>
> > <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > *Subject:* Re: help debugging transfer failures
> > Hi,
> >
> > This page has the majority of failed files where the
> transfer
> > time is
> > 300-600s (plus a few over that). Not one below 300s
> that I've seen.
> >
> >
> http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,on)&d.error_code=154&d.state=(TRANSFER_FAILED)&date.from=201805021050&date.interval=0&date.to=201805021450&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LIV-HEP%22)&dst.tier=(0,1,2)&dst.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&samples=true&src.site=(-RUCIOTEST,-MWTEST,-RDF)&src.tier=(0,1,2)&src.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&tab=details
> >
> > John
> >
> > On 03/05/2018 10:45, Duncan Rand wrote:
> > > John
> > >
> > > Do you have an example of one of those transfers? Here
> > >
> > >
> https://fts106.cern.ch:8449/var/log/fts3/transfers/2018-05-03/srm.ndgf.org__se2.ppgrid1.rhul.ac.uk/2018-05-03-0856__srm.ndgf.org__se2.ppgrid1.rhul.ac.uk__761281463__e7e8646a-434c-59d0-b37f-a4d8917f1113
> >
> > >
> > >
> > > I see a 10GB file taking about 42 minutes and then
> failing. There are a
> > > number of FTS configurations here
> > >
> > >
> https://fts3-pilot.cern.ch:8449/fts3/ftsmon/#/config/gfal2
> > >
> > > a couple are indeed set to 300s/5mins.
> > >
> > > Duncan
> > >
> > > On 03/05/2018 09:57, George, Simon wrote:
> > >> Thanks John.
> > >>
> > >> Who is able to check if FTS itself has a timeout
> in place?
> > >>
> > >>
> > >>
> > >>
> ------------------------------------------------------------------------
> > >> *From:* GRIDPP2: Deployment and support of SRM and
> local storage
> > >> management <[log in to unmask]
> <mailto:[log in to unmask]>
> > <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>> on behalf of John Bland
> > >> <[log in to unmask]
> <mailto:[log in to unmask]> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>>
> > >> *Sent:* 02 May 2018 23:10
> > >> *To:* [log in to unmask]
> <mailto:[log in to unmask]>
> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > >> *Subject:* Re: help debugging transfer failures
> > >> Looking at some of the failed transfers we see at
> Liverpool the SRM logs
> > >> show a 5minute timeout of some sort. SRM Put
> starts, the gridftp server
> > >> transfers perfectly, but if the transfer takes
> more than 5minutes the
> > >> SRM control connection gets terminated (but not
> the GridFTP one that
> > >> I've seen). The client then appears to just delete
> the file in these
> > >> circumstances.
> > >>
> > >> Although it's more than possible our uni firewall
> is doing this, given
> > >> that at least a handful of sites are seeing
> similar issues and that the
> > >> FTS logs themselves show an INFO error of "Timeout
> stopped" I'd also be
> > >> eyeing the FTS servers suspiciously as well.
> > >>
> > >> It probably only shows up with big files (any I've
> checked are >2GB at
> > >> least) or if the WAN is being saturated enough to
> take the transfer of
> > >> 5mins.
> > >>
> > >> John
> > >>
> > >> On 02/05/18 17:18, Govind Songara wrote:
> > >>> Hi All,
> > >>>
> > >>> As mentioned in today meeting, we still see this
> error.
> > >>> It would be great if you can help on this problem.
> > >>>
> > >>> Thanks
> > >>> Govind
> > >>>
> > >>> On Tue, Apr 10, 2018 at 11:47 AM, George, Simon
> <[log in to unmask] <mailto:[log in to unmask]>
> <mailto:[log in to unmask] <mailto:[log in to unmask]>>
> > >>> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>> wrote:
> > >>>
> > >>> I found examples the same type of error at
> Lancaster if you're
> > >>> interested:
> > >>>
> > >>>
> > >>>
> > >>>
> http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details
> >
> > >>>
> > >>>
> > >>>
> <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details>
> >
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> ------------------------------------------------------------------------
> > >>> *From:* George, Simon
> > >>> *Sent:* 06 April 2018 13:17
> > >>> *To:* [log in to unmask]
> <mailto:[log in to unmask]>
> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > >>> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > >>> *Subject:* help debugging transfer failures
> > >>>
> > >>> Dear storage experts, especially DPM
> flavoured ones,
> > >>>
> > >>> I'd be grateful if you could take a look at
> this ticket and give
> > >>> help and/or suggestions on how to get to the
> bottom of it.
> > >>>
> > >>>
> https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144
> > >>>
> <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144>
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Simon
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> John Bland [log in to unmask]
> <mailto:[log in to unmask]> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > >> System Administrator office: 220
> > >> High Energy Physics Division tel (int): 42911
> > >> Oliver Lodge Laboratory tel (ext): +44
> (0)151 794 2911 <tel:0151%20794%202911> <tel:0151%20794%202911>
> > >> University of Liverpool
> http://www.liv.ac.uk/physics/hep/
> > >> "I canna change the laws of physics, Captain!"
> >
> >
> > --
> > John Bland [log in to unmask]
> <mailto:[log in to unmask]> <mailto:[log in to unmask]
> <mailto:[log in to unmask]>>
> > Research Fellow office: 220
> > High Energy Physics Division tel (int): 42911
> > Oliver Lodge Laboratory tel (ext): +44
> (0)151 794 2911 <tel:0151%20794%202911>
> > <tel:0151%20794%202911>
> > University of Liverpool http://www.liv.ac.uk/physics/hep/
> > "I canna change the laws of physics, Captain!"
> >
>
>
> --
> John Bland [log in to unmask] <mailto:[log in to unmask]>
> Research Fellow office: 220
> High Energy Physics Division tel (int): 42911
> Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911
> <tel:0151%20794%202911>
> University of Liverpool http://www.liv.ac.uk/physics/hep/
> "I canna change the laws of physics, Captain!"
>
|