Hello,
> WN to RB is not practically possible as our RB is on a live IP whereas the
> WN is on a private IP.
I don't believe that this should be a problem. Try to copy a file with
globus-url-copy and enable the dbg option (i.e. globus-url-copy -dbg
-vb). Post the results here.
regards,
Yiannis
On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
>
> Dear Yiannis,
>
> 1) I have checked all the disk sizes. They are all fine.
> 2) I have tried to copy a file from worker node back to ce but it couldn't
> be done, it gives error: a system call failed (Connection refused).
> WN to RB is not practically possible as our RB is on a live IP whereas the
> WN is on a private IP.
> 3) I am only using 2 WNs for during my investigation period.
> 4) qmgr -c "p s" |grep acl returns:
> set queue atlas acl_group_enable = True
> set queue atlas acl_groups = atlas
> set queue alice acl_group_enable = True
> set queue alice acl_groups = alice
> set queue lhcb acl_group_enable = True
> set queue lhcb acl_groups = lhcb
> set queue cms acl_group_enable = True
> set queue cms acl_groups = cms
> set queue dteam acl_group_enable = True
> set queue dteam acl_groups = dteam
> set queue ops acl_group_enable = True
> set queue ops acl_groups = ops
> set server acl_host_enable = False
>
> Thanks for your reply,
>
> -- Best Regards --
> Adeel
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
> On Behalf Of Yiannis Ioannou
> Sent: Monday, August 27, 2007 4:13 PM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Hello there,
>
> ->Please do the following checks:
> - Check the available disk size of all the machines.
> - try to copy a file from a worker node back to the ce and rb with
> globus-url-copy
> - locate the worker node that the job fail
> - what does
> qmgr -c "p s" |grep acl
> gives?
>
> regards,
> Yiannis
>
>
>
> On 8/27/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> >
> >
> >
> >
> >
> >
> > Dear Maarten,
> >
> >
> >
> > Sorry for the mistake.
> >
> >
> >
> > I am getting now the same error, i.e., Unspecified_gridmanager_error.
> >
> > I am also getting the same old behaviour from globus-job-run, i.e.:
> >
> >
> >
> > *************************************************************
> >
> > BOOKKEEPING INFORMATION:
> >
> >
> >
> > Status info for the Job :
> > https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> >
> > Current Status: Aborted
> >
> > Status Reason: Job RetryCount (3) hit
> >
> > Destination:
> > pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> >
> >
> > reached on: Mon Aug 27 10:25:28 2007
> >
> >
> >
> > In fact, I was able to complete globus-job-run without specifying our own
> > CE.
> >
> >
> >
> > -- Best Regards --
> >
> > Adeel-ur-Rehman
> >
> >
> >
> >
> >
> > ________________________________
> >
> >
> > From: Adeel-ur-Rehman [mailto:[log in to unmask]]
> > Sent: Monday, August 27, 2007 2:44 PM
> > To: 'Maarten Litmaath'
> > Cc: [log in to unmask]
> > Subject: RE: [LCG-ROLLOUT] Job Submission Failure
> >
> >
> >
> >
> >
> > Dear Maarten,
> >
> >
> >
> > > I tried to submit the job using an ordinary user account (i.e. adeel)
> from
> >
> > > UI which is only a member of dteam VO.
> >
> >
> >
> > >>On the CE you can "su" to an "sgm" account and try a qsub: does it work?
> >
> >
> >
> >
> >
> > Yes I have tried that successfully.
> >
> >
> >
> >
> >
> > > I have tested the PBS stagein functionality by running the script
> attached
> >
> > > under a grid user account by specifying its corresponding queue name as
> an
> >
> > > argument, I got "test successful" message.
> >
> >
> >
> > >>I suppose it was the grid account for an ordinary user, e.g. ops001?
> >
> > >>Try with an "sgm" account instead.
> >
> >
> >
> >
> >
> > That's also working fine with me.
> >
> >
> >
> > But still I am getting the same Unspecified_gridmanager_error although now
> I
> > can successfully complete the globus-job-run procedure with no errors.
> >
> >
> >
> > -- Best Regards --
> >
> > Adeel
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Maarten Litmaath [mailto:[log in to unmask]]
> > Sent: Monday, August 27, 2007 2:19 PM
> > To: Adeel-ur-Rehman
> > Cc: [log in to unmask]
> > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> >
> >
> >
> >
> > Hi Adeel,
> >
> >
> >
> > > I tried to submit the job using an ordinary user account (i.e. adeel)
> from
> >
> > > UI which is only a member of dteam VO.
> >
> >
> >
> > On the CE you can "su" to an "sgm" account and try a qsub: does it work?
> >
> >
> >
> > > Regarding the reconfiguration of the CE, I only upgraded it to the
> latest
> >
> > > available update of glite-3.1.
> >
> > >
> >
> > > Yes I checked the suggestions on the page
> >
> > >
> > http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error
> >
> > >
> >
> > >
> >
> > > /var/spool/pbs/mom_logs on the WN don't state anything, so it seems that
> > the
> >
> > > jobs are not actually executing.
> >
> > >
> >
> > > I have tested the PBS stagein functionality by running the script
> attached
> >
> > > under a grid user account by specifying its corresponding queue name as
> an
> >
> > > argument, I got "test successful" message.
> >
> >
> >
> > I suppose it was the grid account for an ordinary user, e.g. ops001?
> >
> > Try with an "sgm" account instead.
> >
>
>
|