Hi, Since updating the library we've had no transfers fail in the way they were previously. No new issues that we've noticed. We'll give it another day to be sure, but I think this has fixed it. Cheers, John On 29/05/2018 16:55, Sam Skipsey wrote: > Hello everyone: > > The DPM devs have a (beta) fix for the problem, which you're welcome to > test if you want. It's a modified version of the CGSI-GSOAP library for > SL7/Centos7, which should "do the right thing" when a server gets a > request to use a new connection from a client with an existing (but > timed out) connection. > > http://dmc-repo.web.cern.ch/dmc-repo/rc/el7/x86_64/?C=M;O=D for the > testing release. (You can roll back to the previous version safely if > this doesn't work.) > > Do feel free to test it, or not, and feed back if it changes anything > for you. > > Sam > > On Wed, May 16, 2018 at 10:02 AM Sam Skipsey <[log in to unmask] > <mailto:[log in to unmask]>> wrote: > > hi Everyone: update from DPM dev - SRM has a hard-coded (no, you > can't modify it in a configuration script) 5 minute timeout in it. > The difference between the SL6 and SL7 behaviour is that, for some > reason, with the SL6/RHEL6 version of gsoap, the connection timeout > doesn't stop SRM responding to packets afterward (!), whilst for > SL7/Centos7/RHEL7 it seems to. > Exactly why this happens - and how to resolve it - is in progress. > > Sam > > On Tue, May 15, 2018 at 8:35 AM John Bland <[log in to unmask] > <mailto:[log in to unmask]>> wrote: > > Ditto for Liverpool. > > Anyone got any word from the FTS guys yet? > > John > > On 14/05/2018 22:34, Govind Songara wrote: > > Hi Sam, > > > > I have made changes this afternoon, but still seeing the same > error. > > > > Thanks > > Govind > > > > On Mon, May 14, 2018 at 4:40 PM, Sam Skipsey > > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> wrote: > > > > Hi Chaps, > > > > Andrea at DPM has discovered the following problem with > Centos 7 > > versus SL6 - it ignores ulimit.conf settings for services. > > As there's a few changes to the ulimit max open files > settings for > > srmv2.2, this could be causing the srm service to have issues > > running out of file handles and timing out with requests. > > > > If you'd like to test if this is the case for your sites > (everyone > > with an SRM 500 error), you can apply an increased max > open files in > > systemd by doing: > > > > systemctl edit srmv2.2.service > > > > then adding > > > > [Service] > > LimitNOFILE=65000 > > > > in the editor, > > > > and restarting the service when done. > > > > If you do this, please let me know how it goes so I can > feed back to > > Andrea. > > > > Sam > > > > > > On Mon, May 14, 2018 at 3:48 PM Kashif Mohammad > > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> wrote: > > > > Hi Sam____ > > > > __ __ > > > > We are still failing transfer and as you explained to > me in ops > > meeting that it is the last stage which fails. I > looked through > > the logs for one example and the same file being > tried multiple > > times____ > > > > __ __ > > > > It start with prepare to put____ > > > > __ __ > > > > [root@t2se01 srmv2.2]# grep -r > > "DAOD_EXOT8.13867737._000090.pool.root.1" log____ > > > > 05/14 14:45:29.075 206398,7 PrepareToPut: SRM98 - > PrepareToPut 0 > > > srm://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 15:29:54.943 206398,0 PrepareToPut: SRM98 - > PrepareToPut 0 > > > srm://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > __ __ > > > > Looking at the pool node, the file was actually > copied ____ > > > > __ __ > > > > [root@t2se45 dpm-gsiftp]# grep -r > > DAOD_EXOT8.13867737._000090.pool.root.1 gridftp.log____ > > > > [41663] Mon May 14 14:45:30 2018 :: dmlite :: stat :: > > > /t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0 > > :: /DC=ch/DC=cern/OU=Organic > > Units/OU=Users/CN=ddmadmin/CN=531497/C > N=Robot: ATLAS > > Data Management :: fts106.cern.ch > <http://fts106.cern.ch> <http://fts106.cern.ch>____ > > > > [41663] Mon May 14 14:45:31 2018 :: dmlite :: user > error :: 2 :: > > [#00.000002] Could not open > > > t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0 > > :: /DC=ch/DC=cern/OU=Organic > > Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data > > Management :: fts106.cern.ch <http://fts106.cern.ch> > <http://fts106.cern.ch>____ > > > > [41663] Mon May 14 14:45:31 2018 :: Starting to transfer > > > "/t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0".____ > > > > [41663] Mon May 14 15:09:31 2018 :: Finished transferring > > > "/t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0".____ > > > > __ __ > > > > __ __ > > > > Then it was deleted, probably put_done process failed____ > > > > __ __ > > > > 05/14 14:45:28.731 4714,0 Cns_srv_delete: NS098 - delete > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 14:45:29.239 4714,0 Cns_srv_stat: NS098 - stat 0 > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 14:45:29.323 4714,0 Cns_srv_creat: NS098 - creat > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > 664 22____ > > > > 05/14 14:45:29.365 4714,0 Cns_srv_addreplica: NS098 - > > addreplica t2se45.physics.ox.ac.uk > <http://t2se45.physics.ox.ac.uk> > > <http://t2se45.physics.ox.ac.uk> > > > t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____ > > > > 05/14 14:45:31.157 4714,0 Cns_srv_statg: NS098 - statg > > > /dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____ > > > > 05/14 14:45:31.199 4714,0 Cns_srv_statr: NS098 - statr > > > t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____ > > > > 05/14 14:45:31.489 4714,0 Cns_srv_accessr: NS098 - > accessr 2 > > > t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____ > > > > 05/14 15:09:31.362 4714,0 Cns_srv_statr: NS098 - statr > > > t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____ > > > > 05/14 15:09:31.444 4714,0 Cns_srv_getreplicax: NS098 - > > getreplicax > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 15:09:34.400 4714,0 Cns_srv_delreplica: NS098 - > > delreplica > > > t2se45.physics.ox.ac.uk:/dpm/pool3/atlas/2018-05-14/DAOD_EXOT8.13867737._000090.pool.root.1.194527267.0____ > > > > 05/14 15:09:34.402 4714,0 Cns_srv_getreplicax: NS098 - > > getreplicax > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 15:09:34.403 4714,0 Cns_srv_unlink: NS098 - unlink > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 15:09:34.546 4714,0 Cns_srv_delete: NS098 - delete > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > 05/14 15:29:54.602 4714,1 Cns_srv_delete: NS098 - delete > > > /dpm/physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1 <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1> > > > <http://physics.ox.ac.uk/home/atlas/atlasdatadisk/rucio/mc16_13TeV/3f/c8/DAOD_EXOT8.13867737._000090.pool.root.1>____ > > > > __ __ > > > > __ __ > > > > Should we open a ticket with dpm developers ?____ > > > > __ __ > > > > Cheers____ > > > > __ __ > > > > Kashif ____ > > > > __ __ > > > > __ __ > > > > __ __ > > > > __ __ > > > > __ __ > > > > *From:*GRIDPP2: Deployment and support of SRM and > local storage > > management [mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>] *On Behalf Of *Sam Skipsey > > *Sent:* 10 May 2018 15:11 > > *To:* [log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > *Subject:* Re: help debugging transfer failures____ > > > > __ __ > > > > Okay so: > > > > Working sites (with no evidence of the signature SRM > PUT-DONE > > SOAP errors): > > Glasgow (SL6, DPM 1.9.0, IPv4) > > Lancaster (SL6, DPM 1.9.0, IPv6) > > > > Issue Sites: > > Liverpool (Centos 7, DPM 1.9.x, IPv4) > > Oxford (SL7, DPM 1.9.x, IPv4) > > RHUL (Centos 7, DPM 1.9.x, IPv4) > > ECDF-RDF (Centos 7, DPM 1.9.x, IPv4 I think because 6 was > > problematic) > > > > So, assuming this means anything, the only common > factor seems > > to be RHEL7-based release rather than a RHEL-6 one. > > > > Sam____ > > > > __ __ > > > > On Thu, May 10, 2018 at 2:37 PM George, Simon > > <[log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>>> > wrote:____ > > > > No IPV6 on storage yet, still working on the > perfsonar :-)____ > > > > __ __ > > > > On 10 May 2018 14:33, Matt Doidge > <[log in to unmask] <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> wrote:____ > > > > Hi Sam, > > We're SL6 at Lancaster still (and only on 1.9.0. - > > upgrading's on my > > todo list). > > > > Cheers, > > Matt > > > > On 10/05/18 14:23, Sam Skipsey wrote: > > > Sneaking suspicion: which of you guys have IPv6 > turned on your storage? > > > > > > I think Lancaster's also Centos 7 / DPM 1.9.x > (Matt, am I remembering > > > right?), but Matt did some Exciting Things to > fix odd IPv6 problems, as > > > I recall. > > > > > > On Thu, May 10, 2018 at 2:17 PM Sam Skipsey > <[log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> wrote: > > > > > > Okay, so everyone with an issue with a > ticket is on Centos 7 and DPM > > > 1.9.x... (this is a head node issue, so > that's the important bit). > > > > > > I'll just check the sites I know aren't > SL7/Centos 7 in the > > > monitoring and see if they are different. > > > > > > Sam > > > > > > On Thu, May 10, 2018 at 11:46 AM John Bland > <[log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> wrote: > > > > > > At Liverpool all Centos7.4, DPM 1.9.2, > puppet. > > > > > > On 10/05/2018 11:37, Govind Songara wrote: > > > > Thanks Simon, headnode is configured > using puppet. Pool node > > > still uses > > > > yaim. > > > > > > > > On Thu, 10 May 2018, 11:19 a.m. > George, Simon, > > > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>><mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>><mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > wrote: > > > > > > > > Hi Sam, > > > > > > > > RHUL is running DPM 1.9.0 on > Centos 7.3 on the SE head node. > > > > > > > > The storage nodes are DPM 1.8.10 > on SL6.9. > > > > > > > > Simon > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > *From:* Sam Skipsey > <[log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>>> > > > > *Sent:* 10 May 2018 11:12 > > > > *To:* George, Simon > > > > *Cc:* > [log in to unmask] <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > *Subject:* Re: [GRIDPP-STORAGE] > help debugging transfer > > > failures > > > > Hello: > > > > > > > > So, it looks like Oxford and > RHUL and the new ECDF-RDF have > > > > something in common, as all of > your transfer failures > > > look similar > > > > from the ATLAS logs (they look > like SOAP errors on PUT > > > DONE (error > > > > code 500), on otherwise > successful transfers). > > > > > > > > I know Oxford is running on SL7 > with DPM 1.9.2 - is there > > > anything > > > > in common with the other two of you? > > > > > > > > Sam > > > > > > > > On Sun, May 6, 2018 at 12:33 PM > George, Simon > > > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>><mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> wrote: > > > > > > > > We got a new ticket for the > same problem this weekend: > > > > > > > > > https://ggus.eu/index.php?mode=ticket_info&ticket_id=134945 > > > <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134945> > > > > > > > > How can we move forward on this? > > > > > > > > Change FTS parameters - how? > > > > > > > > > > > > Thanks, > > > > > > > > Simon > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > *From:* GRIDPP2: Deployment > and support of SRM and > > > local storage > > > > management > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>>> on behalf of John > > Bland > > > > <[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>>> > > > > *Sent:* 03 May 2018 10:52 > > > > *To:* > [log in to unmask] <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > *Subject:* Re: help > debugging transfer failures > > > > Hi, > > > > > > > > This page has the majority > of failed files where the > > > transfer > > > > time is > > > > 300-600s (plus a few over > that). Not one below 300s > > > that I've seen. > > > > > > > > > > > > http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,on)&d.error_code=154&d.state=(TRANSFER_FAILED)&date.from=201805021050&date.interval=0&date.to=201805021450&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LIV-HEP%22)&dst.tier=(0,1,2)&dst.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&samples=true&src.site=(-RUCIOTEST,-MWTEST,-RDF)&src.tier=(0,1,2)&src.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&tab=details > > > <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default,on)&d.error_code=154&d.state=(TRANSFER_FAILED)&date.from=201805021050&date.interval=0&date.to=201805021450&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LIV-HEP%22)&dst.tier=(0,1,2)&dst.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&samples=true&src.site=(-RUCIOTEST,-MWTEST,-RDF)&src.tier=(0,1,2)&src.token=(-IPV6TEST,-DDMTEST,-CEPH,-PPSSCRATCHDISK)&tab=details> > > > > > > > > John > > > > > > > > On 03/05/2018 10:45, Duncan > Rand wrote: > > > > > John > > > > > > > > > > Do you have an example of > one of those transfers? Here > > > > > > > > > > > > > > https://fts106.cern.ch:8449/var/log/fts3/transfers/2018-05-03/srm.ndgf.org__se2.ppgrid1.rhul.ac.uk/2018-05-03-0856__srm.ndgf.org__se2.ppgrid1.rhul.ac.uk__761281463__e7e8646a-434c-59d0-b37f-a4d8917f1113 > > > <https://fts106.cern.ch:8449/var/log/fts3/transfers/2018-05-03/srm.ndgf.org__se2.ppgrid1.rhul.ac.uk/2018-05-03-0856__srm.ndgf.org__se2.ppgrid1.rhul.ac.uk__761281463__e7e8646a-434c-59d0-b37f-a4d8917f1113> > > > > > > > > > > > > > > > > > > > I see a 10GB file taking > about 42 minutes and then > > > failing. There are a > > > > > number of FTS > configurations here > > > > > > > > > > > > > > https://fts3-pilot.cern.ch:8449/fts3/ftsmon/#/config/gfal2 > > > <https://fts3-pilot.cern.ch:8449/fts3/ftsmon/#/config/gfal2> > > > > > > > > > > a couple are indeed set to > 300s/5mins. > > > > > > > > > > Duncan > > > > > > > > > > On 03/05/2018 09:57, > George, Simon wrote: > > > > >> Thanks John. > > > > >> > > > > >> Who is able to check if > FTS itself has a timeout > > > in place? > > > > >> > > > > >> > > > > >> > > > > >> > > > > ------------------------------------------------------------------------ > > > > >> *From:* GRIDPP2: > Deployment and support of SRM and > > > local storage > > > > >> management > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>>> on behalf of John > > Bland > > > > >> <[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>>> > > > > >> *Sent:* 02 May 2018 23:10 > > > > >> *To:* > [log in to unmask] <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > >> *Subject:* Re: help > debugging transfer failures > > > > >> Looking at some of the > failed transfers we see at > > > Liverpool the SRM logs > > > > >> show a 5minute timeout of > some sort. SRM Put > > > starts, the gridftp server > > > > >> transfers perfectly, but > if the transfer takes > > > more than 5minutes the > > > > >> SRM control connection > gets terminated (but not > > > the GridFTP one that > > > > >> I've seen). The client > then appears to just delete > > > the file in these > > > > >> circumstances. > > > > >> > > > > >> Although it's more than > possible our uni firewall > > > is doing this, given > > > > >> that at least a handful > of sites are seeing > > > similar issues and that the > > > > >> FTS logs themselves show > an INFO error of "Timeout > > > stopped" I'd also be > > > > >> eyeing the FTS servers > suspiciously as well. > > > > >> > > > > >> It probably only shows up > with big files (any I've > > > checked are >2GB at > > > > >> least) or if the WAN is > being saturated enough to > > > take the transfer of > > > > >> 5mins. > > > > >> > > > > >> John > > > > >> > > > > >> On 02/05/18 17:18, Govind > Songara wrote: > > > > >>> Hi All, > > > > >>> > > > > >>> As mentioned in today > meeting, we still see this > > > error. > > > > >>> It would be great if you > can help on this problem. > > > > >>> > > > > >>> Thanks > > > > >>> Govind > > > > >>> > > > > >>> On Tue, Apr 10, 2018 at > 11:47 AM, George, Simon > > > <[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>><mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>><mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > > >>> > <mailto:[log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> wrote: > > > > >>> > > > > >>> I found examples > the same type of error at > > > Lancaster if you're > > > > >>> interested: > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details > > > <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details> > > > > > > > > >>> > > > > >>> > > > > >>> > > > > <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details > > > <http://dashb-atlas-ddm.cern.ch/ddm2/#activity=(Data+Brokering,Data+Consolidation,Deletion,Express,Functional+Test,Group+Subscriptions,Production,Production+Input,Production+Output,Recovery,Staging,T0+Export,T0+Tape,User+Subscriptions,default)&d.dst.cloud=%22UK%22&d.dst.site=%22UKI-NORTHGRID-LANCS-HEP%22&d.dst.token=%22DATADISK%22&d.error_code=229&d.src.cloud=%22CA%22&d.state=(TRANSFER_FAILED)&date.from=201804050000&date.interval=0&date.to=201804070000&dst.cloud=(%22UK%22)&dst.site=(%22UKI-NORTHGRID-LANCS-HEP%22)&dst.tier=(0,1,2)&dst.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&grouping.dst=(cloud,site,token)&m.content=(d_dof,d_eff,d_faf,s_eff,s_err,s_suc,t_eff,t_err,t_suc)&p.grouping=src&samples=true&src.site=(-TEST,-RDF,-AWS,-CEPH)&src.tier=(0,1,2)&src.token=(-TEST,-CEPH,-PPS,-GRIDFTP)&tab=details>> > > > > > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > ------------------------------------------------------------------------ > > > > >>> *From:* George, Simon > > > > >>> *Sent:* 06 April > 2018 13:17 > > > > >>> *To:* > [log in to unmask] <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > >>> > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > >>> *Subject:* help > debugging transfer failures > > > > >>> > > > > >>> Dear storage > experts, especially DPM > > > flavoured ones, > > > > >>> > > > > >>> I'd be grateful if > you could take a look at > > > this ticket and give > > > > >>> help and/or > suggestions on how to get to the > > > bottom of it. > > > > >>> > > > > >>> > > > > https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144 > > > <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144> > > > > >>> > > > > <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144 > > > <https://ggus.eu/index.php?mode=ticket_info&ticket_id=134144>> > > > > >>> > > > > >>> Thanks, > > > > >>> > > > > >>> Simon > > > > >>> > > > > >>> > > > > >> > > > > >> > > > > >> -- > > > > >> John Bland > [log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > >> System > Administrator office: 220 > > > > >> High Energy Physics > Division tel (int): 42911 > > > > >> Oliver Lodge > Laboratory tel (ext): +44 > > > (0)151 794 2911 <tel:0151%20794%202911> > > <tel:0151%20794%202911><tel:0151%20794%202911> > > <tel:0151%20794%202911> > > > > >> University of Liverpool > > > http://www.liv.ac.uk/physics/hep/ > > <http://www.liv.ac.uk/physics/hep/> > > > > >> "I canna change the laws > of physics, Captain!" > > > > > > > > > > > > -- > > > > John Bland > [log in to unmask] <mailto:[log in to unmask]> > <mailto:[log in to unmask] <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> <mailto:[log in to unmask] > <mailto:[log in to unmask]>> > > > <mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>>> > > > > Research > Fellow office: 220 > > > > High Energy Physics > Division tel (int): 42911 > > > > Oliver Lodge > Laboratory tel (ext): +44 > > > (0)151 794 2911 <tel:0151%20794%202911> > <tel:0151%20794%202911><tel:0151%20794%202911> > > > > <tel:0151%20794%202911> > > > > University of Liverpool > http://www.liv.ac.uk/physics/hep/ > > <http://www.liv.ac.uk/physics/hep/> > > > > "I canna change the laws of > physics, Captain!" > > > > > > > > > > > > > -- > > > John Bland [log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>><mailto:[log in to unmask] > <mailto:[log in to unmask]> > > <mailto:[log in to unmask] > <mailto:[log in to unmask]>>> > > > Research Fellow > office: 220 > > > High Energy Physics Division tel > (int): 42911 > > > Oliver Lodge Laboratory tel > (ext): +44 (0)151 794 2911 <tel:0151%20794%202911> > <tel:0151%20794%202911> > > > <tel:0151%20794%202911> > > > University of Liverpool > http://www.liv.ac.uk/physics/hep/ > > <http://www.liv.ac.uk/physics/hep/> > > > "I canna change the laws of physics, > Captain!" > > > ____ > > > > > > > -- > John Bland [log in to unmask] <mailto:[log in to unmask]> > Research Fellow office: 220 > High Energy Physics Division tel (int): 42911 > Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911 > <tel:0151%20794%202911> > University of Liverpool http://www.liv.ac.uk/physics/hep/ > "I canna change the laws of physics, Captain!" > -- John Bland [log in to unmask] Research Fellow office: 220 High Energy Physics Division tel (int): 42911 Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2911 University of Liverpool http://www.liv.ac.uk/physics/hep/ "I canna change the laws of physics, Captain!" To unsubscribe from the GRIDPP-STORAGE list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=GRIDPP-STORAGE&A=1