Sorry forgot some important information: As I am doing the tar as root on the file server, all ownership, time stamps etc are fully conserved. Lydia On Thu, 24 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote: > OK; let me know how you're doing. > > Cheers > --jens > > On 23/12/2015 18:30, Lydia Heck wrote: >> >> Hi Jens, >> >> I need to add one more functionality to the script then I am ready. >> That will happen tomorrow. Then I will keep on going .... >> >> Lydia >> >> >> On Wed, 23 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote: >> >>> Hi Lydia, >>> >>> Great stuff. So you will be moving data over the Christmas break? This >>> will be good... >>> >>> ... we also need to get Jon started; whether he wants to run your >>> script, too, or do something else. And we need to clear out the old >>> data, but I'd ask Brian to look into that once he's back in the new >>> year. >>> >>> Merry and Happy to you too. >>> >>> Cheers >>> -j >>> >>> >>> >>> On 23/12/2015 15:01, Lydia Heck wrote: >>>> Hi Jens and all, >>>> >>>> I have a script now that tars up directories per DiRAC project into >>>> tar files of specific size. Once one tar file is complete it is >>>> archived to RAL. Once the archive is complete the tar file is deleted >>>> and the next set of files is being archived. >>>> >>>> The chunk size at present is 256 GByte or slightly bigger depends on >>>> the size of files. >>>> >>>> The transfer of such a file takes ~15 minutes. >>>> >>>> The script needs some polishing and once I am totally happy I can run >>>> it non-interactively. I currently still have an interactive element in >>>> the script as the last debugging and other idea stages are not fully >>>> completed. >>>> >>>> Merry Christmas and a Happy New year. >>>> >>>> Lydia >>>> >>>> >>>> >>>> >>>> On Tue, 22 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote: >>>> >>>>> Hooray for downgrading! >>>>> >>>>> On 22/12/2015 13:49, Lydia Heck wrote: >>>>>> >>>>>> Done it. I have down-graded to 3.3.3-x and now the lot works. >>>>>> >>>>>> Lydia >>>>>> >>>>>> >>>>>> On Tue, 22 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote: >>>>>> >>>>>>> Weird! Which version are you using? >>>>>>> >>>>>>> We seem to have fts-rest-3.3.3-2 and fts-rest-cli-3.3.3 and >>>>>>> fts-rest-cloud-storage-3.3.3 and python-fts-3.3.3 but every other >>>>>>> fts >>>>>>> package on the server is 3.3.2. (There is both a python-fts and an >>>>>>> fts-python - weird). >>>>>>> >>>>>>> Cheers >>>>>>> --jens >>>>>>> >>>>>>> On 22/12/2015 12:22, Lydia Heck wrote: >>>>>>>> Hi Jens, >>>>>>>> >>>>>>>> I have a script that I could test. However I now have an issue >>>>>>>> that the >>>>>>>> >>>>>>>> fts-transfer command does not work anymore with the error message >>>>>>>> >>>>>>>> >>>>>>>> fts client is connecting using the gSOAP interface. Consider >>>>>>>> changing >>>>>>>> your configured fts endpoint port to select the REST >>>>>>>> interface >>>>>>>> >>>>>>>> I am currently rebooting the system, but have you seen something >>>>>>>> similar once before? >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> Lydia >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, 18 Dec 2015, Jens Jensen wrote: >>>>>>>> >>>>>>>>> Right, so the find suggestion at least would do a depth first >>>>>>>>> listing of >>>>>>>>> files-to-add, and tar I am guessing would also add files depth >>>>>>>>> first, >>>>>>>>> which I think meets your requirement, or close enough, of putting >>>>>>>>> related files into the same chunk. >>>>>>>>> >>>>>>>>> Using find-and-then-tar you could avoid building the following >>>>>>>>> archive >>>>>>>>> until the current one has been sent off to RAL. You'd just need >>>>>>>>> space >>>>>>>>> for the filelist. >>>>>>>>> >>>>>>>>> What I am thinking is: >>>>>>>>> 1. find <folder to be backed up> -newer <timestamp file> |<list >>>>>>>>> size and >>>>>>>>> full filename> >filelist >>>>>>>>> 2. Walk through filelist one line at a time adding up sizes and >>>>>>>>> filenames till a certain threshold size has been exceeded (say >>>>>>>>> 20GB or >>>>>>>>> 100,000 files, whichever comes firsts) or adding the next file >>>>>>>>> will >>>>>>>>> take >>>>>>>>> us above a higher threshold (say 50GB) >>>>>>>>> 3. Once a list has been found, tar it up, compress it, optionally >>>>>>>>> store >>>>>>>>> the contents (list) somewhere, send the tarball to RAL, and then >>>>>>>>> delete it. >>>>>>>>> 4. Go back to step 2 until the filelist has been completed. >>>>>>>>> 5. Touch the timestamp file >>>>>>>>> 6. sleep 24 hours (or whatever) and go to step 1. >>>>>>>>> >>>>>>>>> This would meet all our requirements and would be stupidly easy >>>>>>>>> to do. >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> --jens >>>>>>>>> >>>>>>>>> >>>>>>>>> On 17/12/2015 12:43, Lydia Heck wrote: >>>>>>>>>> >>>>>>>>>> Hi Jens, >>>>>>>>>> >>>>>>>>>> it took longer than I thought to tidy up the results from the >>>>>>>>>> meeting >>>>>>>>>> last week (I spent a full day on a spreadsheet :-) ) >>>>>>>>>> >>>>>>>>>> However I am now going to look at the transfers again. >>>>>>>>>> >>>>>>>>>> I looked over the presentation you shared with us. And yes, >>>>>>>>>> that is >>>>>>>>>> the way it should go. There are some provisos: >>>>>>>>>> >>>>>>>>>> If I create 3 TB chunks, I need to have space for several of >>>>>>>>>> them: >>>>>>>>>> >>>>>>>>>> One being transfered, one in waiting and one being prepared. This >>>>>>>>>> will >>>>>>>>>> add 10 TB to the storage that is not available for the users; >>>>>>>>>> can be >>>>>>>>>> done, but needs to be factored in. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If there is indeed a failure, then I need to identify where the >>>>>>>>>> data >>>>>>>>>> are that have been deleted, corrupted or whatever. If I "just" >>>>>>>>>> chunk >>>>>>>>>> the whole filesystem, that would be difficult, if not >>>>>>>>>> impossible to >>>>>>>>>> find. So I would need to arrange transfers by project, and even >>>>>>>>>> then >>>>>>>>>> the retrieval might physically not be possible, depending of how >>>>>>>>>> many >>>>>>>>>> of the chunks I would have to retrieve. >>>>>>>>>> >>>>>>>>>> I believe that currently the biggest top folder is ~500 TB. >>>>>>>>>> >>>>>>>>>> There would not be lots of jobs running, simply because there is >>>>>>>>>> not >>>>>>>>>> enough space to chunk that much. >>>>>>>>>> >>>>>>>>>> On the storage that I would like to archive there are more >>>>>>>>>> than 64M >>>>>>>>>> files. >>>>>>>>>> >>>>>>>>>> So would a "flat" chunking tar of all the filesystem be a "good" >>>>>>>>>> idea? I am not sure. >>>>>>>>>> >>>>>>>>>> I need to think about this a bit more. >>>>>>>>>> >>>>>>>>>> Lydia >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 10 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote: >>>>>>>>>> >>>>>>>>>>> Hi Lydia, >>>>>>>>>>> >>>>>>>>>>> That's great. I am actually on leave tomorrow (travelling) >>>>>>>>>>> and out >>>>>>>>>>> Monday (at Royal Holloway) but the others on the list can >>>>>>>>>>> follow up. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> --jens >>>>>>>>>>> >>>>>>>>>>> On 10/12/2015 10:21, Lydia Heck wrote: >>>>>>>>>>>> >>>>>>>>>>>> Dear all, >>>>>>>>>>>> >>>>>>>>>>>> sorry for my silence. I have a meeting in London on Tuesday and >>>>>>>>>>>> attended CIUK yesterday. Just back and I have to tidy up some >>>>>>>>>>>> spreadsheets from Tuesday's meeting and I will be busy today as >>>>>>>>>>>> well >>>>>>>>>>>> with local tasks. So I should get back to this tomorrow. >>>>>>>>>>>> >>>>>>>>>>>> Best wishes, >>>>>>>>>>>> Lydia >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, 9 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Here is the proposal the third option. Would also be worth >>>>>>>>>>>>> looking >>>>>>>>>>>>> into. >>>>>>>>>>>>> It is written in python AFAIK. >>>>>>>>>>>>> >>>>>>>>>>>>> Overall we are trying to deploy something that meets the >>>>>>>>>>>>> requirements >>>>>>>>>>>>> and saves us time in the long run. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> --jens >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >