JISCMail - GRIDPP-STORAGE Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
GRIDPP-STORAGE Archives

GRIDPP-STORAGE@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		GRIDPP-STORAGE Home
		GRIDPP-STORAGE May 2018
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: help debugging transfer failures
From:
John Bland <[log in to unmask]>
Reply-To:
John Bland <[log in to unmask]>
Date:
Tue, 15 May 2018 08:34:30 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (617 lines)
Ditto for Liverpool.

Anyone got any word from the FTS guys yet?

John

On 14/05/2018 22:34, Govind Songara wrote:
> Hi Sam,
> 
> I have made changes this afternoon, but still seeing the same error.
> 
> Thanks
> Govind
> 
> On Mon, May 14, 2018 at 4:40 PM, Sam Skipsey 
> <[log in to unmask] 
> <mailto:[log in to unmask]>> wrote:
> 
>     Hi Chaps,
> 
>     Andrea at DPM has discovered the following problem with Centos 7
>     versus SL6 - it ignores ulimit.conf settings for services.
>     As there's a few changes to the ulimit max open files settings for
>     srmv2.2, this could be causing the srm service to have issues
>     running out of file handles and timing out with requests.
> 
>     If you'd like to test if this is the case for your sites (everyone
>     with an SRM 500 error), you can apply an increased max open files in
>     systemd by doing:
> 
>     systemctl edit srmv2.2.service
> 
>     then adding
> 
>     [Service]
>     LimitNOFILE=65000
> 
>     in the editor,
> 
>     and restarting the service when done.
> 
>     If you do this, please let me know how it goes so I can feed back to
>     Andrea.
> 
>     Sam
> 
> 
>     On Mon, May 14, 2018 at 3:48 PM Kashif Mohammad
>     <[log in to unmask]
>     <mailto:[log in to unmask]>> wrote:
> 
>         Hi Sam____
> 
>         __ __
> 
>         We are still failing transfer and as you explained to me in ops
>         meeting that it is the last stage which fails. I looked through
>         the logs for one example and the same file being tried multiple
>         times____
> 
>         __ __
> 
>         It start with prepare to put____
> 
>         __ __
> 
>         [root@t2se01 srmv2.2]# grep -r
>         "DAOD_EXOT8.13867737._000090.pool.root.1"  log____
> 
>         05/14 14:45:29.075 206398,7 PrepareToPut: SRM98 - PrepareToPut 0
>         srm://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 15:29:54.943 206398,0 PrepareToPut: SRM98 - PrepareToPut 0
>         srm://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         __ __
> 
>         Looking at the pool node, the file was actually copied ____
> 
>         __ __
> 
>         [root@t2se45 dpm-gsiftp]# grep -r
>         DAOD_EXOT8.13867737._000090.pool.root.1 gridftp.log____
> 
>         [41663] Mon May 14 14:45:30 2018 :: dmlite :: stat ::
>         /t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0
>         :: /DC=ch/DC=cern/OU=Organic
>         Units/OU=Users/CN=ddmadmin/CN=531497/C           N=Robot: ATLAS
>         Data Management :: fts106.cern.ch <http://fts106.cern.ch>____
> 
>         [41663] Mon May 14 14:45:31 2018 :: dmlite :: user error :: 2 ::
>         [#00.000002] Could not open
>         t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0
>         :: /DC=ch/DC=cern/OU=Organic       
>         Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data
>         Management :: fts106.cern.ch <http://fts106.cern.ch>____
> 
>         [41663] Mon May 14 14:45:31 2018 :: Starting to transfer
>         "/t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0".____
> 
>         [41663] Mon May 14 15:09:31 2018 :: Finished transferring
>         "/t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0".____
> 
>         __ __
> 
>         __ __
> 
>         Then it was deleted, probably put_done process failed____
> 
>         __ __
> 
>         05/14 14:45:28.731  4714,0 Cns_srv_delete: NS098 - delete
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 14:45:29.239  4714,0 Cns_srv_stat: NS098 - stat 0
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 14:45:29.323  4714,0 Cns_srv_creat: NS098 - creat
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>
>         664 22____
> 
>         05/14 14:45:29.365  4714,0 Cns_srv_addreplica: NS098 -
>         addreplica t2se45.physics.ox.ac.uk
>         <http://t2se45.physics.ox.ac.uk>
>         t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____
> 
>         05/14 14:45:31.157  4714,0 Cns_srv_statg: NS098 - statg
>         /dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____
> 
>         05/14 14:45:31.199  4714,0 Cns_srv_statr: NS098 - statr
>         t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____
> 
>         05/14 14:45:31.489  4714,0 Cns_srv_accessr: NS098 - accessr 2
>         t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____
> 
>         05/14 15:09:31.362  4714,0 Cns_srv_statr: NS098 - statr
>         t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____
> 
>         05/14 15:09:31.444  4714,0 Cns_srv_getreplicax: NS098 -
>         getreplicax
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 15:09:34.400  4714,0 Cns_srv_delreplica: NS098 -
>         delreplica 
>         t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____
> 
>         05/14 15:09:34.402  4714,0 Cns_srv_getreplicax: NS098 -
>         getreplicax
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 15:09:34.403  4714,0 Cns_srv_unlink: NS098 - unlink
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 15:09:34.546  4714,0 Cns_srv_delete: NS098 - delete
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         05/14 15:29:54.602  4714,1 Cns_srv_delete: NS098 - delete
>         /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1
>         <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____
> 
>         __ __
> 
>         __ __
> 
>         Should we open a ticket with dpm developers ?____
> 
>         __ __
> 
>         Cheers____
> 
>         __ __
> 
>         Kashif ____
> 
>         __ __
> 
>         __ __
> 
>         __ __
> 
>         __ __
> 
>         __ __
> 
>         *From:*GRIDPP2: Deployment and support of SRM and local storage
>         management [mailto:[log in to unmask]
>         <mailto:[log in to unmask]>] *On Behalf Of *Sam Skipsey
>         *Sent:* 10 May 2018 15:11
>         *To:* [log in to unmask]
>         <mailto:[log in to unmask]>
>         *Subject:* Re: help debugging transfer failures____
> 
>         __ __
> 
>         Okay so:
> 
>         Working sites (with no evidence of the signature SRM PUT-DONE
>         SOAP errors):
>         Glasgow (SL6, DPM 1.9.0, IPv4)
>         Lancaster (SL6, DPM 1.9.0, IPv6)
> 
>         Issue Sites:
>         Liverpool (Centos 7, DPM 1.9.x, IPv4)
>         Oxford (SL7, DPM 1.9.x, IPv4)
>         RHUL (Centos 7, DPM 1.9.x, IPv4)
>         ECDF-RDF (Centos 7, DPM 1.9.x, IPv4 I think because 6 was
>         problematic)
> 
>         So, assuming this means anything, the only common factor seems
>         to be RHEL7-based release rather than a RHEL-6 one.
> 
>         Sam____
> 
>         __ __
> 
>         On Thu, May 10, 2018 at 2:37 PM George, Simon
>         <[log in to unmask] <mailto:[log in to unmask]>> wrote:____
> 
>             No IPV6 on storage yet, still working on the perfsonar :-)____
> 
>             __ __
> 
>             On 10 May 2018 14:33, Matt Doidge <[log in to unmask]
>             <mailto:[log in to unmask]>> wrote:____
> 
>             Hi Sam,
>             We're SL6 at Lancaster still (and only on 1.9.0. -
>             upgrading's on my
>             todo list).
> 
>             Cheers,
>             Matt
> 
>             On 10/05/18 14:23, Sam Skipsey wrote:
>             > Sneaking suspicion: which of you guys have IPv6 turned on your storage?
>             > 
>             > I think Lancaster's also Centos 7 / DPM 1.9.x (Matt, am I remembering 
>             > right?), but Matt did some Exciting Things to fix odd IPv6 problems, as 
>             > I recall.
>             > 
>             > On Thu, May 10, 2018 at 2:17 PM Sam Skipsey <[log in to unmask] <mailto:[log in to unmask]>
>             > <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>> wrote:
>             > 
>             >     Okay, so everyone with an issue with a ticket is on Centos 7 and DPM
>             >     1.9.x... (this is a head node issue, so that's the important bit).
>             > 
>             >     I'll just check the sites I know aren't SL7/Centos 7 in the
>             >     monitoring and see if they are different.
>             > 
>             >     Sam
>             > 
>             >     On Thu, May 10, 2018 at 11:46 AM John Bland <[log in to unmask] <mailto:[log in to unmask]>
>             >     <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>> wrote:
>             > 
>             >         At Liverpool all Centos7.4, DPM 1.9.2, puppet.
>             > 
>             >         On 10/05/2018 11:37, Govind Songara wrote:
>             >          > Thanks Simon, headnode is configured using puppet.  Pool node
>             >         still uses
>             >          > yaim.
>             >          >
>             >          > On Thu, 10 May 2018, 11:19 a.m. George, Simon,
>             >         <[log in to unmask]
>             <mailto:[log in to unmask]><mailto:[log in to unmask]>
>             >          > <mailto:[log in to unmask]
>             <mailto:[log in to unmask]><mailto:[log in to unmask]>>>
>             wrote:
>             >          >
>             >          >     Hi Sam,
>             >          >
>             >          >     RHUL is running DPM 1.9.0 on Centos 7.3 on the SE head node.
>             >          >
>             >          >     The storage nodes are DPM 1.8.10 on SL6.9.
>             >          >
>             >          >     Simon
>             >          >
>             >          >
>             >          >
>             >          >
>             >          >   
>             >           ------------------------------------------------------------------------
>             >          >     *From:* Sam Skipsey <[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask] <mailto:[log in to unmask]>>
>             >          >     <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>>
>             >          >     *Sent:* 10 May 2018 11:12
>             >          >     *To:* George, Simon
>             >          >     *Cc:* [log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >          >     <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >     *Subject:* Re: [GRIDPP-STORAGE] help debugging transfer
>             >         failures
>             >          >     Hello:
>             >          >
>             >          >     So, it looks like Oxford and RHUL  and the new ECDF-RDF have
>             >          >     something in common, as all of your transfer failures
>             >         look similar
>             >          >     from the ATLAS logs (they look like SOAP errors on PUT
>             >         DONE (error
>             >          >     code 500), on otherwise successful transfers).
>             >          >
>             >          >     I know Oxford is running on SL7 with DPM 1.9.2 - is there
>             >         anything
>             >          >     in common with the other two of you?
>             >          >
>             >          >     Sam
>             >          >
>             >          >     On Sun, May 6, 2018 at 12:33 PM George, Simon
>             >         <[log in to unmask]
>             <mailto:[log in to unmask]><mailto:[log in to unmask]>
>             >          >     <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]>>> wrote:
>             >          >
>             >          >         We got a new ticket for the same problem this weekend:
>             >          >
>             >          > https://ggus.eu/index.php?mode=ticket_info&ticket_id=134945
>             <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134945>
>             >          >
>             >          >         How can we move forward on this?
>             >          >
>             >          >         Change FTS parameters - how?
>             >          >
>             >          >
>             >          >         Thanks,
>             >          >
>             >          >         Simon
>             >          >
>             >          >
>             >          >
>             >          >       
>             >           ------------------------------------------------------------------------
>             >          >         *From:* GRIDPP2: Deployment and support of SRM and
>             >         local storage
>             >          >         management <[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >          >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>> on behalf of John
>             Bland
>             >          >         <[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>>
>             >          >         *Sent:* 03 May 2018 10:52
>             >          >         *To:* [log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >          >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >         *Subject:* Re: help debugging transfer failures
>             >          >         Hi,
>             >          >
>             >          >         This page has the majority of failed files where the
>             >         transfer
>             >          >         time is
>             >          >         300-600s (plus a few over that). Not one below 300s
>             >         that I've seen.
>             >          >
>             >          >
>             >         http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,on)&d.error_code=154&d.state=(TRANSFER_FAILED)&date.from=201805021050&date.interval=0&date.to=201805021450&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LIV-HEP%22)&dst.tier=(0,1,2)&dst.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&samples=true&src.site=(-RUCIOTEST,-MWTEST,-RDF)&src.tier=(0,1,2)&src.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&tab=details
>             <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,on)&d.error_code=154&d.state=(TRANSFER_FAILED)&date.from=201805021050&date.interval=0&date.to=201805021450&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LIV-HEP%22)&dst.tier=(0,1,2)&dst.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&samples=true&src.site=(-RUCIOTEST,-MWTEST,-RDF)&src.tier=(0,1,2)&src.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&tab=details>
>             >          >
>             >          >         John
>             >          >
>             >          >         On 03/05/2018 10:45, Duncan Rand wrote:
>             >          >         > John
>             >          >         >
>             >          >         > Do you have an example of one of those transfers? Here
>             >          >         >
>             >          >         >
>             >         https://fts106.cern.ch:8449/var/log/fts3/transfers/2018-05-03/srm.ndgf.org__se2.ppgrid1.rhul.ac.uk/2018-05-03-0856__srm.ndgf.org__se2.ppgrid1.rhul.ac.uk__761281463__e7e8646a-434c-59d0-b37f-a4d8917f1113
>             <https://fts106.cern.ch:8449/var/log/fts3/transfers/2018-05-03/srm.ndgf.org__se2.ppgrid1.rhul.ac.uk/2018-05-03-0856__srm.ndgf.org__se2.ppgrid1.rhul.ac.uk__761281463__e7e8646a-434c-59d0-b37f-a4d8917f1113>
>             >          >
>             >          >         >
>             >          >         >
>             >          >         > I see a 10GB file taking about 42 minutes and then
>             >         failing. There are a
>             >          >         > number of FTS configurations here
>             >          >         >
>             >          >         >
>             >         https://fts3-pilot.cern.ch:8449/fts3/ftsmon/#/config/gfal2
>             <https://fts3-pilot.cern.ch:8449/fts3/ftsmon/#/config/gfal2>
>             >          >         >
>             >          >         > a couple are indeed set to 300s/5mins.
>             >          >         >
>             >          >         > Duncan
>             >          >         >
>             >          >         > On 03/05/2018 09:57, George, Simon wrote:
>             >          >         >> Thanks John.
>             >          >         >>
>             >          >         >> Who is able to check if FTS itself has a timeout
>             >         in place?
>             >          >         >>
>             >          >         >>
>             >          >         >>
>             >          >         >>
>             >         ------------------------------------------------------------------------
>             >          >         >> *From:* GRIDPP2: Deployment and support of SRM and
>             >         local storage
>             >          >         >> management <[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >          >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>> on behalf of John
>             Bland
>             >          >         >> <[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>>
>             >          >         >> *Sent:* 02 May 2018 23:10
>             >          >         >> *To:* [log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >         >> *Subject:* Re: help debugging transfer failures
>             >          >         >> Looking at some of the failed transfers we see at
>             >         Liverpool the SRM logs
>             >          >         >> show a 5minute timeout of some sort. SRM Put
>             >         starts, the gridftp server
>             >          >         >> transfers perfectly, but if the transfer takes
>             >         more than 5minutes the
>             >          >         >> SRM control connection gets terminated (but not
>             >         the GridFTP one that
>             >          >         >> I've seen). The client then appears to just delete
>             >         the file in these
>             >          >         >> circumstances.
>             >          >         >>
>             >          >         >> Although it's more than possible our uni firewall
>             >         is doing this, given
>             >          >         >> that at least a handful of sites are seeing
>             >         similar issues and that the
>             >          >         >> FTS logs themselves show an INFO error of "Timeout
>             >         stopped" I'd also be
>             >          >         >> eyeing the FTS servers suspiciously as well.
>             >          >         >>
>             >          >         >> It probably only shows up with big files (any I've
>             >         checked are >2GB at
>             >          >         >> least) or if the WAN is being saturated enough to
>             >         take the transfer of
>             >          >         >> 5mins.
>             >          >         >>
>             >          >         >> John
>             >          >         >>
>             >          >         >> On 02/05/18 17:18, Govind Songara wrote:
>             >          >         >>> Hi All,
>             >          >         >>>
>             >          >         >>> As mentioned in today meeting, we still see this
>             >         error.
>             >          >         >>> It would be great if you can help on this problem.
>             >          >         >>>
>             >          >         >>> Thanks
>             >          >         >>> Govind
>             >          >         >>>
>             >          >         >>> On Tue, Apr 10, 2018 at 11:47 AM, George, Simon
>             >         <[log in to unmask]
>             <mailto:[log in to unmask]><mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]><mailto:[log in to unmask]>>
>             >          >         >>> <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]>>> wrote:
>             >          >         >>>
>             >          >         >>>      I found examples the same type of error at
>             >         Lancaster if you're
>             >          >         >>>      interested:
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >         http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details
>             <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details>
>             >          >
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >         <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details
>             <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details>>
>             >          >
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >          >         >>>
>             >         ------------------------------------------------------------------------
>             >          >         >>>      *From:* George, Simon
>             >          >         >>>      *Sent:* 06 April 2018 13:17
>             >          >         >>>      *To:* [log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >         >>>      <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >         >>>      *Subject:* help debugging transfer failures
>             >          >         >>>
>             >          >         >>>      Dear storage experts, especially DPM
>             >         flavoured ones,
>             >          >         >>>
>             >          >         >>>      I'd be grateful if you could take a look at
>             >         this ticket and give
>             >          >         >>>      help and/or suggestions on how to get to the
>             >         bottom of it.
>             >          >         >>>
>             >          >         >>>
>             >         https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144
>             <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144>
>             >          >         >>>     
>             >         <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144
>             <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144>>
>             >          >         >>>
>             >          >         >>>      Thanks,
>             >          >         >>>
>             >          >         >>>      Simon
>             >          >         >>>
>             >          >         >>>
>             >          >         >>
>             >          >         >>
>             >          >         >> --
>             >          >         >> John Bland [log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >         >> System Administrator             office: 220
>             >          >         >> High Energy Physics Division     tel (int): 42911
>             >          >         >> Oliver Lodge Laboratory          tel (ext): +44
>             >         (0)151 794 2911
>             <tel:0151%20794%202911><tel:0151%20794%202911>
>             <tel:0151%20794%202911>
>             >          >         >> University of Liverpool
>             >         http://www.liv.ac.uk/physics/hep/
>             <http://www.liv.ac.uk/physics/hep/>
>             >          >         >> "I canna change the laws of physics, Captain!"
>             >          >
>             >          >
>             >          >         --
>             >          >         John Bland [log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             <mailto:[log in to unmask] <mailto:[log in to unmask]>
>             >         <mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>>
>             >          >         Research Fellow                  office: 220
>             >          >         High Energy Physics Division     tel (int): 42911
>             >          >         Oliver Lodge Laboratory          tel (ext): +44
>             >         (0)151 794 2911 <tel:0151%20794%202911><tel:0151%20794%202911>
>             >          >         <tel:0151%20794%202911>
>             >          >         University of Liverpool http://www.liv.ac.uk/physics/hep/
>             <http://www.liv.ac.uk/physics/hep/>
>             >          >         "I canna change the laws of physics, Captain!"
>             >          >
>             > 
>             > 
>             >         -- 
>             >         John Bland [log in to unmask]
>             <mailto:[log in to unmask]><mailto:[log in to unmask]
>             <mailto:[log in to unmask]>>
>             >         Research Fellow                  office: 220
>             >         High Energy Physics Division     tel (int): 42911
>             >         Oliver Lodge Laboratory          tel (ext): +44 (0)151 794 2911 <tel:0151%20794%202911>
>             >         <tel:0151%20794%202911>
>             >         University of Liverpool http://www.liv.ac.uk/physics/hep/
>             <http://www.liv.ac.uk/physics/hep/>
>             >         "I canna change the laws of physics, Captain!"
>             > ____
> 
> 


-- 
John Bland                       [log in to unmask]
Research Fellow                  office: 220
High Energy Physics Division     tel (int): 42911
Oliver Lodge Laboratory          tel (ext): +44 (0)151 794 2911
University of Liverpool          http://www.liv.ac.uk/physics/hep/
"I canna change the laws of physics, Captain!"
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options