JiscMail Logo
Email discussion lists for the UK Education and Research communities

Help for DIRAC-USERS Archives


DIRAC-USERS Archives

DIRAC-USERS Archives


DIRAC-USERS@JISCMAIL.AC.UK


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

DIRAC-USERS Home

DIRAC-USERS Home

DIRAC-USERS  January 2016

DIRAC-USERS January 2016

Options

Subscribe or Unsubscribe

Subscribe or Unsubscribe

Log In

Log In

Get Password

Get Password

Subject:

Re: Backup proposal -option 3

From:

Lydia Heck <[log in to unmask]>

Reply-To:

Lydia Heck <[log in to unmask]>

Date:

Tue, 12 Jan 2016 10:34:09 +0000

Content-Type:

TEXT/PLAIN

Parts/Attachments:

Parts/Attachments

TEXT/PLAIN (124 lines)

Hi Jens,

my problems were towards the end of the week (Saturday) and in the end I 
canceled those jobs as they did not lead to anywhere. So they will turn up as 
"cancelled"

Best wishes,
Lydia



  On Tue, 12 Jan 2016, Jensen, Jens (STFC,RAL,SC) wrote:

> Hi Lydia,
>
> If you see the problem again, you should be able to see it in the
> ftsmon. There were four timeouts to three disk servers on Monday the 4th
> during 15:00-16:00 but I didn't see any since then. If we see those
> again we need to investigate more closely but it seems to me they were
> likely due to some networking problem or some related type of blip.
>
> The other failures within the past seven days were either a
> cancelled-by-user or the ones that were killed at 3600 seconds. Maybe
> your client didn't exit properly after the server had terminated the
> transfer?
>
> Send us the id if you see another one.
>
> Cheers
> --jens
>
> On 12/01/2016 09:12, Lydia Heck wrote:
>>
>> Hi Jens,
>>
>> I am asking now for 36000 seconds, as I had the problem with the large
>> file.
>>
>> So that cannot be the problem in these transfers that "hang" after
>> having transfered everything and then just sitting there with 0k
>> transfer rate
>>
>> Lydia
>>
>>
>> On Mon, 11 Jan 2016, Jensen, Jens (STFC,RAL,SC) wrote:
>>
>>> Brian says it's just the timeout you ask for when you submit (with the
>>> --timeout switch). Or rather, 3600 is the timeout you get if you don't
>>> ask for one :-)
>>>
>>> It might be too low a limit but it is at least easy to fix by asking for
>>> a higher timeout.
>>>
>>> Cheers
>>> --jens
>>>
>>> On 11/01/2016 14:55, Lydia Heck wrote:
>>>>
>>>> Should this be forwarded to the support email?
>>>>
>>>> Lydia
>>>>
>>>>
>>>> On Mon, 11 Jan 2016, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>
>>>>> Brian points out I missed a 3600 second timeout on the transfer (there
>>>>> is more thanone type of timeout). So it follows that the successful
>>>>> transfers at the same time would have taken less than one hour?
>>>>>
>>>>> On 11/01/2016 13:36, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>>> On 11/01/2016 12:23, Lydia Heck wrote:
>>>>>>> once I had sent the previous response I realised that maybe I had
>>>>>>> made
>>>>>>> myself not clear: it the last 3 or 4 cancelled jobs that are of note
>>>>>>> here.
>>>>>>>
>>>>>> There are some which are disk server timeouts, and they are
>>>>>> attempting
>>>>>> to go to:
>>>>>> 2016-01-04T15:07:26    ***    130.246.179.46
>>>>>> 2016-01-04T15:21:30    ***    130.246.179.44
>>>>>> 2016-01-04T15:35:47    ***    130.246.179.47
>>>>>> 2016-01-04T15:50:48    ***    130.246.179.44
>>>>>>
>>>>>> These are the ones which seem to time out (and have ~7500 seconds
>>>>>> between submit time and start time, just more than two hours):
>>>>>>
>>>>>> https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/8abd37a1-e13f-45c5-9c98-7b3f2c475b8e
>>>>>>
>>>>>>
>>>>>> https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/919d9423-9f71-4b72-91de-f3cf8c38d44b
>>>>>>
>>>>>>
>>>>>> and this one whcih says it was canceled but is in the "FAILED" bucket
>>>>>> (maybe because it retried?)
>>>>>> https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/d6b5299d-f2a7-433b-a22a-31add26682b3
>>>>>>
>>>>>>
>>>>>>
>>>>>> Looking at the logs they seem to transfer happily and then suddenly
>>>>>> time
>>>>>> out after precisely 60 minutes... to within a second (e.g.
>>>>>> starting at
>>>>>> 21:33:46 and getting killed at 22:33:46). Hmm...
>>>>>>
>>>>>> And the same log says
>>>>>>
>>>>>> Resetting global timeout thread to 33600 seconds
>>>>>>
>>>>>> so that's not it. And it's not the proxy because it's a happy long
>>>>>> lived
>>>>>> one.
>>>>>>
>>>>>> It certainly suggests a problem at the server end - getting killed in
>>>>>> the three thousand six hundreth second of the transfer is quite
>>>>>> suspicious...
>>>>>>
>>>>>> Cheers
>>>>>> -j
>>>>>
>>>
>

Top of Message | Previous Page | Permalink

JiscMail Tools


RSS Feeds and Sharing


Advanced Options


Archives

October 2023
March 2023
February 2023
June 2022
May 2022
January 2022
September 2018
February 2018
November 2017
September 2017
August 2017
July 2017
June 2017
March 2017
February 2017
January 2017
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015


JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager