Hi Jens, my problems were towards the end of the week (Saturday) and in the end I canceled those jobs as they did not lead to anywhere. So they will turn up as "cancelled" Best wishes, Lydia On Tue, 12 Jan 2016, Jensen, Jens (STFC,RAL,SC) wrote: > Hi Lydia, > > If you see the problem again, you should be able to see it in the > ftsmon. There were four timeouts to three disk servers on Monday the 4th > during 15:00-16:00 but I didn't see any since then. If we see those > again we need to investigate more closely but it seems to me they were > likely due to some networking problem or some related type of blip. > > The other failures within the past seven days were either a > cancelled-by-user or the ones that were killed at 3600 seconds. Maybe > your client didn't exit properly after the server had terminated the > transfer? > > Send us the id if you see another one. > > Cheers > --jens > > On 12/01/2016 09:12, Lydia Heck wrote: >> >> Hi Jens, >> >> I am asking now for 36000 seconds, as I had the problem with the large >> file. >> >> So that cannot be the problem in these transfers that "hang" after >> having transfered everything and then just sitting there with 0k >> transfer rate >> >> Lydia >> >> >> On Mon, 11 Jan 2016, Jensen, Jens (STFC,RAL,SC) wrote: >> >>> Brian says it's just the timeout you ask for when you submit (with the >>> --timeout switch). Or rather, 3600 is the timeout you get if you don't >>> ask for one :-) >>> >>> It might be too low a limit but it is at least easy to fix by asking for >>> a higher timeout. >>> >>> Cheers >>> --jens >>> >>> On 11/01/2016 14:55, Lydia Heck wrote: >>>> >>>> Should this be forwarded to the support email? >>>> >>>> Lydia >>>> >>>> >>>> On Mon, 11 Jan 2016, Jensen, Jens (STFC,RAL,SC) wrote: >>>> >>>>> Brian points out I missed a 3600 second timeout on the transfer (there >>>>> is more thanone type of timeout). So it follows that the successful >>>>> transfers at the same time would have taken less than one hour? >>>>> >>>>> On 11/01/2016 13:36, Jensen, Jens (STFC,RAL,SC) wrote: >>>>>> On 11/01/2016 12:23, Lydia Heck wrote: >>>>>>> once I had sent the previous response I realised that maybe I had >>>>>>> made >>>>>>> myself not clear: it the last 3 or 4 cancelled jobs that are of note >>>>>>> here. >>>>>>> >>>>>> There are some which are disk server timeouts, and they are >>>>>> attempting >>>>>> to go to: >>>>>> 2016-01-04T15:07:26 *** 130.246.179.46 >>>>>> 2016-01-04T15:21:30 *** 130.246.179.44 >>>>>> 2016-01-04T15:35:47 *** 130.246.179.47 >>>>>> 2016-01-04T15:50:48 *** 130.246.179.44 >>>>>> >>>>>> These are the ones which seem to time out (and have ~7500 seconds >>>>>> between submit time and start time, just more than two hours): >>>>>> >>>>>> https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/8abd37a1-e13f-45c5-9c98-7b3f2c475b8e >>>>>> >>>>>> >>>>>> https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/919d9423-9f71-4b72-91de-f3cf8c38d44b >>>>>> >>>>>> >>>>>> and this one whcih says it was canceled but is in the "FAILED" bucket >>>>>> (maybe because it retried?) >>>>>> https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/d6b5299d-f2a7-433b-a22a-31add26682b3 >>>>>> >>>>>> >>>>>> >>>>>> Looking at the logs they seem to transfer happily and then suddenly >>>>>> time >>>>>> out after precisely 60 minutes... to within a second (e.g. >>>>>> starting at >>>>>> 21:33:46 and getting killed at 22:33:46). Hmm... >>>>>> >>>>>> And the same log says >>>>>> >>>>>> Resetting global timeout thread to 33600 seconds >>>>>> >>>>>> so that's not it. And it's not the proxy because it's a happy long >>>>>> lived >>>>>> one. >>>>>> >>>>>> It certainly suggests a problem at the server end - getting killed in >>>>>> the three thousand six hundreth second of the transfer is quite >>>>>> suspicious... >>>>>> >>>>>> Cheers >>>>>> -j >>>>> >>> >