Hi,
Gareth Smith suggested that someone on the TB support might be able to help on the issue of loading files remotely for SNO+. Our processing jobs need to able to run over large datasets in single jobs (variable but potentially around 100 GB). Until now we’ve just been splitting these jobs, with each sub-job downloading single files to local disk (via lcg-cp or gfal-copy) and then processing. However, we’ve reached a point where we now know we need to process the entire dataset in a single job. I know that we should instead load files over local network, but it’s not clear to me how to do this with e.g. XRootD. Additionally, we may have some non-ROOT format files, in which case we would want in some way to map the LFN/SURL to a local path (while this is possible with lustre, I’m guessing it might not be possible with other storage systems).
I’ve compiled ROOT with XRootD, but as I understand it there are some extra configurations that we may need at each site in order to support our loading files from the SURL directly. If anyone could advise me of the necessary steps (and whether these will also allow for loading of files from non-grid nodes as well - as some of our processing is better run on local batch systems) then I’d be very grateful.
The output datasets of the jobs will also be large. We can have a script that runs in parallel with the jobs to push outputs to Grid storage as they are produced but if there’s a better (e.g. writing directly to Grid storage) then I’d be interested to hear of it.
Thanks,
Matt
-------------------------------------------------
Matthew Mottram
School of Physics and Astronomy
Queen Mary, University of London
-------------------------------------------------
|