Print

Print


Ian Stokes-Rees wrote:

> [Friends, below find an account of my adventures with LCG data
> management commands as I searched for the elusive LHCb files at
> Birmingham.  I am a novice with LCG Data Mgmt, so I think my experience
> is similar to what another novice would go through.  I thought you would
> all be interested in this account of "Adventures in Grid World".]
>
> Hi John,
>
> John Gordon wrote:
>
>> Ian, as an LHCb person, you are better placed than most to run a job at
>> this site and ls the LHCb directories to see what is there. You are also
>> better placed to identify the data from directory and filenames.
>
>
> Yes, I suppose that is a good point, however, somewhat embarassingly I
> don't know how to do this off the top of my head.  I would very much
> appreciate an outline of how one would go about doing this.

Short answer: the LFC will make this easy and fast, the old RLS does not.

The quickest way may be to ask the site admin to do an "ls -lR" of the
area dedicated to LHCb!

Slow recipe:

1. Find the SE:

-------------------------------------------------------------------------------------
$ ldapsearch -x -h lxn1178.cern.ch:2170 -b o=grid | grep -i '^GlueSEUniqueID: .*bham'
GlueSEUniqueID: epcf37.ph.bham.ac.uk
-------------------------------------------------------------------------------------

2. Find the file names:

-------------------------------------------------------------------------------------
$ export LRC_ENDPOINT=http://rlslhcb.cern.ch:7777/lhcb/v2.2/\
edg-local-replica-catalog/services/edg-local-replica-catalog

$ edg-lrc mappingsByPfn 's??://epcf37.ph.bham.ac.uk*' --endpoint $LRC_ENDPOINT
guid:6edc1ce8-0838-11d9-ba4e-bf7bc633f989, sfn://epcf37.ph.bham.ac.uk/storage/lhcb/...
[...]
guid:6ffa1f34-f840-476b-953a-323ba66bcd3f, sfn://epcf37.ph.bham.ac.uk/storage/lhcb/...
-------------------------------------------------------------------------------------

Note: it is a good idea *not* to start the pattern with a wildcard,
to allow the DB to use an index for the query!

3. Find the size of each file:

-------------------------------------------------------------------------------------
$ edg-gridftp-ls --verbose gsiftp://epcf37.ph.bham.ac.uk/storage/lhcb/...
[...]
-------------------------------------------------------------------------------------

For an "sfn" (i.e. Classic SE) one can just "s/sfn/gsiftp/" to obtain the TURL.

Another issue: below you are confusing the shared area for experiment software
with the SE storage area.  The SE is not mounted on the WN, because LCG does not
support that (any more), for various reasons.

> Below is an account of my attempt.  I have taken about 90 minutes and
> not managed to turn up any useful information.
>
> Looking at the LCG-2 User Guide, at first nothing obvious jumps out
> about how to do this.  A closer inspection suggests that:
>
> lcg-infosites
>
> would be a good command to try.  This does not appear to be installed on
> lcgui02.gridpp.rl.ac.uk (ASIDE: this is my preferred LCG UI, although I
> know they are all slightly different, so I am aware of the
> boring-but-sometimes-necessary "trick" of attempting the same command on
> at least 3 different UIs in order to get a "best 2 out of 3" vote on
> "correct" behaviour).
>
> Then I looked at the GOC DB, which I find to be one of the best sources
> of information regarding LCG.  It lists the SE, but I found that didn't
> get me much further forward.
>
> Next I tried my old trick of nicking the lhcb001 SSH private keys from
> ~/.ssh and trying to ssh directly to the CE or SE.  Thankfully (but
> somewhat unhelpfully in this case) this is no longer possible, at least
> at Birmingham.
>
> I then moved on through the LCG-2 User's Guide, which suggested the
> edg-lrc and edg-lrm commands.  These both told me I needed the output
> from lcg-infosites, so again I was up the proverbial creek.
>
> OK, so I'm resourceful guy, and I know, while awkward, the equivalent of
> ssh'ing to the node can be achieved using globus-job-run.  I used this
> to poke around "/" but couldn't find anything.  "df" also showed no
> mounts to the SE, so I guess the CE doesn't mount the local SE directly.
>
> This then left me with getting the .BrokerInfo information back from a
> job which executted at Birmingham, in hopes of finding the path to the
> local SE within it.  Since:
>
> edg-job-submit -h <birmingham> job.jdl
>
> short-circuits the RB and won't give me a .BrokerInfo file, I had to
> search through archives, emails, and CVS to remind myself about the
> syntax for forcing a job to a particular site using JDL.  FWIW, the
> syntax is:
>
> Requirements  = (other.GlueCEInfoHostName == "epcf36.ph.bham.ac.uk");
>
> Interestingly, my first jobs to come back show that epcf38 is mounted on
> the WNs, and not epcf37 (the listed SE), and furthermore the directory
> /experiment seems to have the SE files in it, rather than /storage (as
> listed in .BrokerInfo, and therefore presumably in the IS/MDS/R-GMA).
>
> In any case, I am now stuck waiting 5-10 minutes for each (effectively)
> "ls" command to execute and return the results.  This turn around time
> means I have to give up, so my question remains un-answered.
>
> Of course, I hope that I have just missed something simple.  I hope that
> if the lcg-infosites command had existed then I could have done
> something like:
>
> edg-rmc mappingsByAlias '*lhcb*' --endpoint <birmingham-endpoint>
>
> However, the output of that command somewhat scares me.  It should/could
> be enormous, and anyway, I don't know if it would be useful.  I really
> need file sizes and file dates as well, and I don't think these are
> returned.
>
> As I understand it, the LCG data management system uses a flat namespace
> for file storage, and any "/" (slashes) are just for convenience.  There
> is no (easy) way to list files in a particularly "pseudo-hierarchy"
> level (files in leaves can be listed using the appropriate wildcard with
> the "pseudo-path" followed by "/*", I suppose).
>
> Also, I have confirmed that /storage does not exist at birmingham, so
> anyone running jobs there hoping to use the .BrokerInfo file or
> edg-brokerinfo commands to point them to the mounted local SE storage
> would be out of luck.
>
> 15 minutes after submitting my "ls /experiment/lhcb" command, I still
> don't have any output.
>
> It suddenly came to me to use edg-gridftp-ls.  This finally showed
> *some* results.  I discovered that the GOC DB listed SE doesn't appear
> to be the Birmingham SE anymore.  It *does* have a /storage directory,
> which contains about 100 megs of LHCb files, in about 10 files in 4
> directories.  However this server/directory is not what the WNs seem to
> connect to.  epcf38:/experiment is not accessible by edg-gridftp-ls, so
> stuck again.
>
> And that is it for me.  Out of time, and on to other things.  I look
> forward to pointers on where I went wrong, and someone satisfying me
> that "There Must Be An Easier Way", or, in fact, *any* way to get the
> information I have failed to retrieve.
>
> Cheers,
>
> Ian.
>
> POSTSCRIPT:
>
> My last ls command finally completed after 22 minutes.  There is nothing
> in /experiment/lhcb, and df reports "lots" of free drive space:
>
> epcf38.ph.bham.ac.uk:/experiment
>                       37903148   1676600  34301160   5% /experiment
>
> (34 gigs).  This is all the WN had mounted:
>
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/hda2             37903148   3899544  32078216  11% /
> /dev/hda1                38859     15107     21746  41% /boot
> epac2.ph.bham.ac.uk:/opt/local/linux/7.3
>                       12207540   4581788   7005644  40%
> /export/local/linux/7.3
> epcf38.ph.bham.ac.uk:/experiment
>                       37903148   1676600  34301160   5% /experiment
>
>
> --
> Ian Stokes-Rees              [log in to unmask]
> Particle Physics, Oxford     http://grid.physics.ox.ac.uk/~stokes