Hi,
yep, that's right, once the job is gone, the file (should be) gone. The
fact that there are some old jobs still left over is a bug, David Smith
of the LCG project is working on this one IIRC.
If the job is still running, the information will be there, so it is at
least in some cases useful.
JT
On Tue, 2004-10-19 at 10:16, Fokke Dijkstra wrote:
> Hi Jeff,
>
> Thank you for the info. Alas, it did not solve the problem.
>
> LHC Computer Grid - Rollout wrote:
> > Here is the answer from your friendly neighborhood GOC, errr, CIC, errrr, Big Site ;-)
> > Look in the magic files contained in
> >
> > /opt/globus/tmp/gram_job_state
> >
> > The first five lines of an example file from this directory are:
> >
> > https://tbn18.nikhef.nl:20134/10114/1094464441/
> > 26
> > 4
> > 25
> > 102488.tbn18.nikhef.nl
> >
> > I am pretty sure the first is the JM contact ID. The 5th
> > line is the PBS job ID. Subsequent lines have things that
> > look very much like EDG job IDs, probably with the sort of
> > translations like '~' -> %7e and that sort foolishness. So I
> > think you can find almost anything you want from these files.
>
> Indeed this directory contains files with contact ids, but for some reason not for the jobs mentioned to me. Since there are still files in this directory from several dates it seems to me that they only stay here in certain circumstances.
>
> Right now the only option left is to get the logging output from the user and to look in the log files for events at the times mentioned in the logging info.
>
> >
> > On Mon, 2004-10-18 at 15:05, Steve Traylen wrote:
> >> On Mon, Oct 18, 2004 at 02:53:25PM +0200 or thereabouts, Fokke
> >> Dijkstra wrote:
> >>> Hello Steve,
> >>>
> >>> Thank you for your help!
> >>>
> >>> LHC Computer Grid - Rollout wrote:
> >>>> Fokke Dijkstra wrote:
> >>>>> When users have problems with their jobs they regularly send me
> >>>>> jobids and sometimes JM contact ids. Currently I have no idea
> >>>>> where to find the information about these jobs. Note that jobids
> >>>>> often refer to machines outside our domain.
> >>>>
> >>>> First the jobids are completely useless to you, the mapping from
> >>>> these to JM contact ids. With JM contact ids you can grep in the
> >>>> globus-gatekeeper.log to get things like the mapped user, the time
> >>>> and date. The pbs job id.
> >>>
> >>> So how do I get a JM contact id if I don't have one?
> >>>
> >>> I have now gotten an id from the user that looks like:
> >>> https://mu6.matrix.sara.nl:20001/17098/1097897838/
> >>
> >> I'm now wishing I had not answered this question, since
> >> that looks like a gatekeeper job manager id which I thought would have
> >> appeared in the globus-gatekeeper.log for a non lcgpbs job manager.
> >>
> >> Someone else can hopefully answer your question.
> >> Steve
> >>>
> >>> I can look for numbers like 20001, 17098 and 1097897838 in the
> >>> globus-gatekeeper.log files. The last number does not appear at all.
> >>> The first two appear multiple times, but seem to refer to jobs from
> >>> other VOs than the one the user mentioned. So what should I exactly look for?
> >>>
> >>>> The normal exception to this but not for you is that on an lcgpbs
> >>>> job manager the /var/log/messages file must also be grabbed.
> >>>
> >>>> I had thought though at NIKHEF you were lucky enough to have the
> >>>> job repository running within lcmaps? Does it not record the JM
> >>>> contact ID?
> >>>
> >>> Probably our neighbours at NIKHEF have running it this way, but I don't think we at SARA do.
> >>>
> >>>> Steve
> >>>
> >>> Fokke Dijkstra
> >>>
>
> --------
> Fokke Dijkstra
> High Performance Computing
> SARA - Reken- en Netwerkdiensten http://www.sara.nl
> Tel. +31 20 592 8004 Fax. +31 20 668 3167
|