Hi Lydia,
Great stuff. So you will be moving data over the Christmas break? This
will be good...
... we also need to get Jon started; whether he wants to run your
script, too, or do something else. And we need to clear out the old
data, but I'd ask Brian to look into that once he's back in the new year.
Merry and Happy to you too.
Cheers
-j
On 23/12/2015 15:01, Lydia Heck wrote:
> Hi Jens and all,
>
> I have a script now that tars up directories per DiRAC project into
> tar files of specific size. Once one tar file is complete it is
> archived to RAL. Once the archive is complete the tar file is deleted
> and the next set of files is being archived.
>
> The chunk size at present is 256 GByte or slightly bigger depends on
> the size of files.
>
> The transfer of such a file takes ~15 minutes.
>
> The script needs some polishing and once I am totally happy I can run
> it non-interactively. I currently still have an interactive element in
> the script as the last debugging and other idea stages are not fully
> completed.
>
> Merry Christmas and a Happy New year.
>
> Lydia
>
>
>
>
> On Tue, 22 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>
>> Hooray for downgrading!
>>
>> On 22/12/2015 13:49, Lydia Heck wrote:
>>>
>>> Done it. I have down-graded to 3.3.3-x and now the lot works.
>>>
>>> Lydia
>>>
>>>
>>> On Tue, 22 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>
>>>> Weird! Which version are you using?
>>>>
>>>> We seem to have fts-rest-3.3.3-2 and fts-rest-cli-3.3.3 and
>>>> fts-rest-cloud-storage-3.3.3 and python-fts-3.3.3 but every other fts
>>>> package on the server is 3.3.2. (There is both a python-fts and an
>>>> fts-python - weird).
>>>>
>>>> Cheers
>>>> --jens
>>>>
>>>> On 22/12/2015 12:22, Lydia Heck wrote:
>>>>> Hi Jens,
>>>>>
>>>>> I have a script that I could test. However I now have an issue
>>>>> that the
>>>>>
>>>>> fts-transfer command does not work anymore with the error message
>>>>>
>>>>>
>>>>> fts client is connecting using the gSOAP interface. Consider changing
>>>>> your configured fts endpoint port to select the REST
>>>>> interface
>>>>>
>>>>> I am currently rebooting the system, but have you seen something
>>>>> similar once before?
>>>>>
>>>>> Best wishes,
>>>>> Lydia
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 18 Dec 2015, Jens Jensen wrote:
>>>>>
>>>>>> Right, so the find suggestion at least would do a depth first
>>>>>> listing of
>>>>>> files-to-add, and tar I am guessing would also add files depth
>>>>>> first,
>>>>>> which I think meets your requirement, or close enough, of putting
>>>>>> related files into the same chunk.
>>>>>>
>>>>>> Using find-and-then-tar you could avoid building the following
>>>>>> archive
>>>>>> until the current one has been sent off to RAL. You'd just need
>>>>>> space
>>>>>> for the filelist.
>>>>>>
>>>>>> What I am thinking is:
>>>>>> 1. find <folder to be backed up> -newer <timestamp file> |<list
>>>>>> size and
>>>>>> full filename> >filelist
>>>>>> 2. Walk through filelist one line at a time adding up sizes and
>>>>>> filenames till a certain threshold size has been exceeded (say
>>>>>> 20GB or
>>>>>> 100,000 files, whichever comes firsts) or adding the next file will
>>>>>> take
>>>>>> us above a higher threshold (say 50GB)
>>>>>> 3. Once a list has been found, tar it up, compress it, optionally
>>>>>> store
>>>>>> the contents (list) somewhere, send the tarball to RAL, and then
>>>>>> delete it.
>>>>>> 4. Go back to step 2 until the filelist has been completed.
>>>>>> 5. Touch the timestamp file
>>>>>> 6. sleep 24 hours (or whatever) and go to step 1.
>>>>>>
>>>>>> This would meet all our requirements and would be stupidly easy
>>>>>> to do.
>>>>>>
>>>>>> Cheers
>>>>>> --jens
>>>>>>
>>>>>>
>>>>>> On 17/12/2015 12:43, Lydia Heck wrote:
>>>>>>>
>>>>>>> Hi Jens,
>>>>>>>
>>>>>>> it took longer than I thought to tidy up the results from the
>>>>>>> meeting
>>>>>>> last week (I spent a full day on a spreadsheet :-) )
>>>>>>>
>>>>>>> However I am now going to look at the transfers again.
>>>>>>>
>>>>>>> I looked over the presentation you shared with us. And yes, that is
>>>>>>> the way it should go. There are some provisos:
>>>>>>>
>>>>>>> If I create 3 TB chunks, I need to have space for several of them:
>>>>>>>
>>>>>>> One being transfered, one in waiting and one being prepared. This
>>>>>>> will
>>>>>>> add 10 TB to the storage that is not available for the users;
>>>>>>> can be
>>>>>>> done, but needs to be factored in.
>>>>>>>
>>>>>>>
>>>>>>> If there is indeed a failure, then I need to identify where the
>>>>>>> data
>>>>>>> are that have been deleted, corrupted or whatever. If I "just"
>>>>>>> chunk
>>>>>>> the whole filesystem, that would be difficult, if not impossible to
>>>>>>> find. So I would need to arrange transfers by project, and even
>>>>>>> then
>>>>>>> the retrieval might physically not be possible, depending of how
>>>>>>> many
>>>>>>> of the chunks I would have to retrieve.
>>>>>>>
>>>>>>> I believe that currently the biggest top folder is ~500 TB.
>>>>>>>
>>>>>>> There would not be lots of jobs running, simply because there is
>>>>>>> not
>>>>>>> enough space to chunk that much.
>>>>>>>
>>>>>>> On the storage that I would like to archive there are more than 64M
>>>>>>> files.
>>>>>>>
>>>>>>> So would a "flat" chunking tar of all the filesystem be a "good"
>>>>>>> idea? I am not sure.
>>>>>>>
>>>>>>> I need to think about this a bit more.
>>>>>>>
>>>>>>> Lydia
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 10 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>>>>
>>>>>>>> Hi Lydia,
>>>>>>>>
>>>>>>>> That's great. I am actually on leave tomorrow (travelling) and out
>>>>>>>> Monday (at Royal Holloway) but the others on the list can
>>>>>>>> follow up.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> --jens
>>>>>>>>
>>>>>>>> On 10/12/2015 10:21, Lydia Heck wrote:
>>>>>>>>>
>>>>>>>>> Dear all,
>>>>>>>>>
>>>>>>>>> sorry for my silence. I have a meeting in London on Tuesday and
>>>>>>>>> attended CIUK yesterday. Just back and I have to tidy up some
>>>>>>>>> spreadsheets from Tuesday's meeting and I will be busy today as
>>>>>>>>> well
>>>>>>>>> with local tasks. So I should get back to this tomorrow.
>>>>>>>>>
>>>>>>>>> Best wishes,
>>>>>>>>> Lydia
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, 9 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Here is the proposal the third option. Would also be worth
>>>>>>>>>> looking
>>>>>>>>>> into.
>>>>>>>>>> It is written in python AFAIK.
>>>>>>>>>>
>>>>>>>>>> Overall we are trying to deploy something that meets the
>>>>>>>>>> requirements
>>>>>>>>>> and saves us time in the long run.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> --jens
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>
|