Hello
1) ATLAS are considering increasing the limit of their ESD file size from the current 10GB to 15GB (or 12GB at least). Would this create any problem at sites? Some sites who admins are on ATLAS mailing list have already replied and I have also spoken with Brian Davies to ask if there are likely to be any problems with this. So far, nobody has mentioned any problems and the vast majority of these files are likely to be kept at Tier 1s anyway. However if you hadn't heard about this and wish to provide feedback now is your time.
2) Over the last few months ATLAS have been testing their job recovery mechanism at RAL and a few other sites. This is something that was 'implemented' before but never really worked properly. It now appears to be working well and saving allowing jobs to finish even if the SE is not up/unstable when the job finishes.
Job recovery works by writing the output of the job to a directory on the WN should it fail when writing the output to the SE. Subsequent pilots will check this directory and try again for a period of 3 hours. If you would like to have job recovery activated at your site you need to create a directory which (atlas) jobs can write too. I would also suggest that this directory has some form of tmp watch enabled on it which clears up files and directories older than 48 hours. Evidence from RAL suggest thats its normally only 1 or 2 jobs that are ever written to the space at a time and the space is normally less than a GB. I have not observed more than 10GB being used. Once you have created this space if you can email [log in to unmask] with the directory (and your site!) and we can add it to the ATLAS configurations. We can switch off job recovery at any time if it does cause a problem at your site. Job recovery would only be used for production jobs as users complain if they have to wait a few hours for things to retry (even if it would save them time overall...)
Thanks
Alastair
|