Right, so the find suggestion at least would do a depth first listing of
files-to-add, and tar I am guessing would also add files depth first,
which I think meets your requirement, or close enough, of putting
related files into the same chunk.
Using find-and-then-tar you could avoid building the following archive
until the current one has been sent off to RAL. You'd just need space
for the filelist.
What I am thinking is:
1. find <folder to be backed up> -newer <timestamp file> |<list size and
full filename> >filelist
2. Walk through filelist one line at a time adding up sizes and
filenames till a certain threshold size has been exceeded (say 20GB or
100,000 files, whichever comes firsts) or adding the next file will take
us above a higher threshold (say 50GB)
3. Once a list has been found, tar it up, compress it, optionally store
the contents (list) somewhere, send the tarball to RAL, and then delete it.
4. Go back to step 2 until the filelist has been completed.
5. Touch the timestamp file
6. sleep 24 hours (or whatever) and go to step 1.
This would meet all our requirements and would be stupidly easy to do.
Cheers
--jens
On 17/12/2015 12:43, Lydia Heck wrote:
>
> Hi Jens,
>
> it took longer than I thought to tidy up the results from the meeting
> last week (I spent a full day on a spreadsheet :-) )
>
> However I am now going to look at the transfers again.
>
> I looked over the presentation you shared with us. And yes, that is
> the way it should go. There are some provisos:
>
> If I create 3 TB chunks, I need to have space for several of them:
>
> One being transfered, one in waiting and one being prepared. This will
> add 10 TB to the storage that is not available for the users; can be
> done, but needs to be factored in.
>
>
> If there is indeed a failure, then I need to identify where the data
> are that have been deleted, corrupted or whatever. If I "just" chunk
> the whole filesystem, that would be difficult, if not impossible to
> find. So I would need to arrange transfers by project, and even then
> the retrieval might physically not be possible, depending of how many
> of the chunks I would have to retrieve.
>
> I believe that currently the biggest top folder is ~500 TB.
>
> There would not be lots of jobs running, simply because there is not
> enough space to chunk that much.
>
> On the storage that I would like to archive there are more than 64M
> files.
>
> So would a "flat" chunking tar of all the filesystem be a "good"
> idea? I am not sure.
>
> I need to think about this a bit more.
>
> Lydia
>
>
>
>
> On Thu, 10 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>
>> Hi Lydia,
>>
>> That's great. I am actually on leave tomorrow (travelling) and out
>> Monday (at Royal Holloway) but the others on the list can follow up.
>>
>> Thanks
>> --jens
>>
>> On 10/12/2015 10:21, Lydia Heck wrote:
>>>
>>> Dear all,
>>>
>>> sorry for my silence. I have a meeting in London on Tuesday and
>>> attended CIUK yesterday. Just back and I have to tidy up some
>>> spreadsheets from Tuesday's meeting and I will be busy today as well
>>> with local tasks. So I should get back to this tomorrow.
>>>
>>> Best wishes,
>>> Lydia
>>>
>>>
>>>
>>>
>>> On Wed, 9 Dec 2015, Jensen, Jens (STFC,RAL,SC) wrote:
>>>
>>>> Hi,
>>>>
>>>> Here is the proposal the third option. Would also be worth looking
>>>> into.
>>>> It is written in python AFAIK.
>>>>
>>>> Overall we are trying to deploy something that meets the requirements
>>>> and saves us time in the long run.
>>>>
>>>> Thanks
>>>> --jens
>>>>
>>>>
>>
|