Print

Print


Thanks Steve.

Now I get different results depending on where I try to copy the file to,
but never the results I want. Thanks in advance if you can find time to
look through the following results.

First I try a very simple command:

 $ globus-job-run pc31.hep.ucl.ac.uk:2119/jobmanager-pbs -q workq \
      /bin/hostname
 pc31.hep.ucl.ac.uk

So far so good, but that was on the CE's own execution queue.

 $ globus-job-run pc31.hep.ucl.ac.uk:2119/jobmanager-pbs -q testq \
      /bin/hostname
 pc55.hep.ucl.ac.uk
 id: cannot find name for group ID 3000
 id: cannot find name for group ID 3000
 id: cannot find name for group ID 3000

Well, it did run on the external PBS queue, but something is not right.
Now let's try globus-url-copy:

 $ globus-job-run pc31.hep.ucl.ac.uk:2119/jobmanager-pbs -q testq \
    /opt/globus/bin/globus-url-copy file:///etc/group \
     gsiftp://gppse05.gridpp.rl.ac.uk/tmp/morejunk99
 id: cannot find name for group ID 3000
 id: cannot find name for group ID 3000
 id: cannot find name for group ID 3000
  at globus_i_ftp_client_response_callback5 Can't open data connection.
 timed out() failed.

In fact the file /tmp/morejunk99 is created on gppse05, but it is not
filled. And if I try to copy to our local SE things don't go much better:

 $ globus-job-run pc31.hep.ucl.ac.uk:2119/jobmanager-pbs -q testq \
      /opt/globus/bin/globus-url-copy file:///etc/group \
       gsiftp://pc30.hep.ucl.ac.uk/temp/junk2
 id: cannot find name for group ID 3000
 id: cannot find name for group ID 3000
 id: cannot find name for group ID 3000
  at globus_i_ftp_client_response_callback3 /temp/junk2: No such file or
 directory.

Cheers,

Ben

On Tue, 10 Jun 2003, Steve Traylen wrote:

> Ben,
>
> globus-job-run $CE/jobmanager-pbs -q myqueue /bin/hostname
>
> will select a queue for you.
>
>  Steve
>
> On Tue, 10 Jun 2003, Ben Waugh wrote:
>
> > Hi Steve,
> >
> > I'm not quite sure how to use this to track the bug down further. When I
> > ran the command you gave on our UI, it worked OK, but I think it was using
> > the old queue "workq". If I recompile the CE profile with workq left out
> > of site-cfg.h, but the new queue "testq" left in, I get:
> >
> >  GRAM Job submission failed because the job failed when the job manager
> >  attempted to run it (error code 17)
> >
> > What else should I try?
> >
> > Cheers,
> >
> > Ben
> >
> > On Tue, 10 Jun 2003, Steve Traylen wrote:
> >
> > > Hi Ben,
> > >
> > > This is usually a problem with the globus-url-copy portion of the
> > > datagrid job where I/O sandbox is transfered to and from the RB
> > > from the WN.
> > >
> > > globus-job-run $CE/jobmanager-pbs \
> > >            /opt/globus/bin/globus-url-copy \
> > >            file:/etc/group \
> > >            gsiftp://gppse05.gridpp.rl.ac.uk/tmp/morejunk
> > >
> > > is good way to debug this.
> > >
> > >  Steve
> > >
> > >
> > > On Tue, 10 Jun 2003, Ben Waugh wrote:
> > >
> > > > An update:
> > > >
> > > > After retrying several times, the job finally fails with Status "Aborted"
> > > > and Status Reason "Failure while executing job wrapper".
> > > >
> > > > Dave (Kant), did you find out what the problem was in your case? Or anyone
> > > > else?
> > > >
> > > > Cheers,
> > > >
> > > > Ben
> > > >
> > > > --
> > > > Dr Ben Waugh                                     Tel. +44 (0)20 7679 3783
> > > > Dept of Physics and Astronomy                    Internal: 33783
> > > > University College London
> > > > London WC1E 6BT
> > > >
> > >
> > > --
> > > Steve Traylen
> > > [log in to unmask]
> > > http://www.gridpp.ac.uk/
> > >
> >
> > --
> > Dr Ben Waugh                                     Tel. +44 (0)20 7679 3783
> > Dept of Physics and Astronomy                    Internal: 33783
> > University College London
> > London WC1E 6BT
> >
>
> --
> Steve Traylen
> [log in to unmask]
> http://www.gridpp.ac.uk/
>

--
Dr Ben Waugh                                     Tel. +44 (0)20 7679 3783
Dept of Physics and Astronomy                    Internal: 33783
University College London
London WC1E 6BT