JISCMail - DIRAC-USERS Archives

Sorry forgot some important information:

As I am doing the tar as root on the file server, all ownership, time stamps etc
are fully conserved.

Lydia


On Thu, 24 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:

> OK; let me know how you're doing.
>
> Cheers
> --jens
>
> On 23/12/2015 18:30, Lydia Heck wrote:
>>
>> Hi Jens,
>>
>> I need to add one more functionality to the script then I am ready.
>> That will happen tomorrow. Then I will keep on going ....
>>
>> Lydia
>>
>>
>> On Wed, 23 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>
>>> Hi Lydia,
>>>
>>> Great stuff. So you will be moving data over the Christmas break? This
>>> will be good...
>>>
>>> ... we also need to get Jon started; whether he wants to run your
>>> script, too, or do something else. And we need to clear out the old
>>> data, but I'd ask Brian to look into that once he's back in the new
>>> year.
>>>
>>> Merry and Happy to you too.
>>>
>>> Cheers
>>> -j
>>>
>>>
>>>
>>> On 23/12/2015 15:01, Lydia Heck wrote:
>>>> Hi Jens and all,
>>>>
>>>> I have a script now that tars up directories per DiRAC project into
>>>> tar files of specific size. Once one tar file is complete it is
>>>> archived to RAL. Once the archive is complete the tar file is deleted
>>>> and the next set  of files is being archived.
>>>>
>>>> The chunk size at present is 256 GByte or slightly bigger depends on
>>>> the size of files.
>>>>
>>>> The transfer of such a file takes ~15 minutes.
>>>>
>>>> The script needs some polishing and once I am totally happy I can run
>>>> it non-interactively. I currently still have an interactive element in
>>>> the script as the last debugging and other idea stages are not fully
>>>> completed.
>>>>
>>>> Merry Christmas and a Happy New year.
>>>>
>>>> Lydia
>>>>
>>>>
>>>>
>>>>
>>>>  On Tue, 22 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>
>>>>> Hooray for downgrading!
>>>>>
>>>>> On 22/12/2015 13:49, Lydia Heck wrote:
>>>>>>
>>>>>> Done it. I have down-graded to 3.3.3-x and now the lot works.
>>>>>>
>>>>>> Lydia
>>>>>>
>>>>>>
>>>>>> On Tue, 22 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>>>
>>>>>>> Weird! Which version are you using?
>>>>>>>
>>>>>>> We seem to have fts-rest-3.3.3-2 and fts-rest-cli-3.3.3 and
>>>>>>> fts-rest-cloud-storage-3.3.3 and python-fts-3.3.3 but every other
>>>>>>> fts
>>>>>>> package on the server is 3.3.2. (There is both a python-fts and an
>>>>>>> fts-python - weird).
>>>>>>>
>>>>>>> Cheers
>>>>>>> --jens
>>>>>>>
>>>>>>> On 22/12/2015 12:22, Lydia Heck wrote:
>>>>>>>> Hi Jens,
>>>>>>>>
>>>>>>>> I have a script that I could test. However I now have an issue
>>>>>>>> that the
>>>>>>>>
>>>>>>>> fts-transfer command does not work anymore with the error message
>>>>>>>>
>>>>>>>>
>>>>>>>> fts client is connecting using the gSOAP interface. Consider
>>>>>>>> changing
>>>>>>>>           your configured fts endpoint port to select the REST
>>>>>>>> interface
>>>>>>>>
>>>>>>>> I am currently rebooting the system, but have you seen something
>>>>>>>> similar once before?
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>> Lydia
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 18 Dec 2015, Jens Jensen wrote:
>>>>>>>>
>>>>>>>>> Right, so the find suggestion at least would do a depth first
>>>>>>>>> listing of
>>>>>>>>> files-to-add, and tar I am guessing would also add files depth
>>>>>>>>> first,
>>>>>>>>> which I think meets your requirement, or close enough, of putting
>>>>>>>>> related files into the same chunk.
>>>>>>>>>
>>>>>>>>> Using find-and-then-tar you could avoid building the following
>>>>>>>>> archive
>>>>>>>>> until the current one has been sent off to RAL. You'd just need
>>>>>>>>> space
>>>>>>>>> for the filelist.
>>>>>>>>>
>>>>>>>>> What I am thinking is:
>>>>>>>>> 1. find <folder to be backed up> -newer <timestamp file> |<list
>>>>>>>>> size and
>>>>>>>>> full filename> >filelist
>>>>>>>>> 2. Walk through filelist one line at a time adding up sizes and
>>>>>>>>> filenames till a certain threshold size has been exceeded (say
>>>>>>>>> 20GB or
>>>>>>>>> 100,000 files, whichever comes firsts) or adding the next file
>>>>>>>>> will
>>>>>>>>> take
>>>>>>>>> us above a higher threshold (say 50GB)
>>>>>>>>> 3. Once a list has been found, tar it up, compress it, optionally
>>>>>>>>> store
>>>>>>>>> the contents (list) somewhere, send the tarball to RAL, and then
>>>>>>>>> delete it.
>>>>>>>>> 4. Go back to step 2 until the filelist has been completed.
>>>>>>>>> 5. Touch the timestamp file
>>>>>>>>> 6. sleep 24 hours (or whatever) and go to step 1.
>>>>>>>>>
>>>>>>>>> This would meet all our requirements and would be stupidly easy
>>>>>>>>> to do.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> --jens
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 17/12/2015 12:43, Lydia Heck wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Jens,
>>>>>>>>>>
>>>>>>>>>> it took longer than I thought to tidy up the results from the
>>>>>>>>>> meeting
>>>>>>>>>> last week (I spent a full day on a spreadsheet :-) )
>>>>>>>>>>
>>>>>>>>>> However I am now going to look at the transfers again.
>>>>>>>>>>
>>>>>>>>>> I looked over the presentation you shared with us. And yes,
>>>>>>>>>> that is
>>>>>>>>>> the way it should go. There are some provisos:
>>>>>>>>>>
>>>>>>>>>> If I create 3 TB chunks, I need to have space for several of
>>>>>>>>>> them:
>>>>>>>>>>
>>>>>>>>>> One being transfered, one in waiting and one being prepared. This
>>>>>>>>>> will
>>>>>>>>>> add 10 TB to the storage that is not available for the users;
>>>>>>>>>> can be
>>>>>>>>>> done, but needs to be factored in.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If there is indeed a failure, then I need to identify where the
>>>>>>>>>> data
>>>>>>>>>> are that have been deleted, corrupted or whatever. If I "just"
>>>>>>>>>> chunk
>>>>>>>>>> the whole filesystem, that would be difficult, if not
>>>>>>>>>> impossible to
>>>>>>>>>> find. So I would need to arrange transfers by project, and even
>>>>>>>>>> then
>>>>>>>>>> the retrieval might physically not be possible, depending of how
>>>>>>>>>> many
>>>>>>>>>> of the chunks I would have to retrieve.
>>>>>>>>>>
>>>>>>>>>> I believe that currently the biggest top folder is ~500 TB.
>>>>>>>>>>
>>>>>>>>>> There would not be lots of jobs running, simply because there is
>>>>>>>>>> not
>>>>>>>>>> enough space to chunk that much.
>>>>>>>>>>
>>>>>>>>>> On the storage that I would like to archive there are more
>>>>>>>>>> than 64M
>>>>>>>>>> files.
>>>>>>>>>>
>>>>>>>>>> So would  a "flat" chunking tar of all the filesystem be a "good"
>>>>>>>>>> idea? I am not sure.
>>>>>>>>>>
>>>>>>>>>> I need to think about this a bit more.
>>>>>>>>>>
>>>>>>>>>> Lydia
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Thu, 10 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Lydia,
>>>>>>>>>>>
>>>>>>>>>>> That's great. I am actually on leave tomorrow (travelling)
>>>>>>>>>>> and out
>>>>>>>>>>> Monday (at Royal Holloway) but the others on the list can
>>>>>>>>>>> follow up.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> --jens
>>>>>>>>>>>
>>>>>>>>>>> On 10/12/2015 10:21, Lydia Heck wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Dear all,
>>>>>>>>>>>>
>>>>>>>>>>>> sorry for my silence. I have a meeting in London on Tuesday and
>>>>>>>>>>>> attended CIUK yesterday. Just back and I have to tidy up some
>>>>>>>>>>>> spreadsheets from Tuesday's meeting and I will be busy today as
>>>>>>>>>>>> well
>>>>>>>>>>>> with local tasks. So I should get back to this tomorrow.
>>>>>>>>>>>>
>>>>>>>>>>>> Best wishes,
>>>>>>>>>>>> Lydia
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  On Wed, 9 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the proposal the third option. Would also be worth
>>>>>>>>>>>>> looking
>>>>>>>>>>>>> into.
>>>>>>>>>>>>> It is written in python AFAIK.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Overall we are trying to deploy something that meets the
>>>>>>>>>>>>> requirements
>>>>>>>>>>>>> and saves us time in the long run.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> --jens
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>