Dear Yiannis,
1) I have checked all the disk sizes. They are all fine.
2) I have tried to copy a file from worker node back to ce but it couldn't
be done, it gives error: a system call failed (Connection refused).
WN to RB is not practically possible as our RB is on a live IP whereas the
WN is on a private IP.
3) I am only using 2 WNs for during my investigation period.
4) qmgr -c "p s" |grep acl returns:
set queue atlas acl_group_enable = True
set queue atlas acl_groups = atlas
set queue alice acl_group_enable = True
set queue alice acl_groups = alice
set queue lhcb acl_group_enable = True
set queue lhcb acl_groups = lhcb
set queue cms acl_group_enable = True
set queue cms acl_groups = cms
set queue dteam acl_group_enable = True
set queue dteam acl_groups = dteam
set queue ops acl_group_enable = True
set queue ops acl_groups = ops
set server acl_host_enable = False
Thanks for your reply,
-- Best Regards --
Adeel
-----Original Message-----
From: LHC Computer Grid - Rollout [mailto:[log in to unmask]]
On Behalf Of Yiannis Ioannou
Sent: Monday, August 27, 2007 4:13 PM
To: [log in to unmask]
Subject: Re: [LCG-ROLLOUT] Job Submission Failure
Hello there,
->Please do the following checks:
- Check the available disk size of all the machines.
- try to copy a file from a worker node back to the ce and rb with
globus-url-copy
- locate the worker node that the job fail
- what does
qmgr -c "p s" |grep acl
gives?
regards,
Yiannis
On 8/27/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
>
>
>
>
>
>
> Dear Maarten,
>
>
>
> Sorry for the mistake.
>
>
>
> I am getting now the same error, i.e., Unspecified_gridmanager_error.
>
> I am also getting the same old behaviour from globus-job-run, i.e.:
>
>
>
> *************************************************************
>
> BOOKKEEPING INFORMATION:
>
>
>
> Status info for the Job :
> https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
>
> Current Status: Aborted
>
> Status Reason: Job RetryCount (3) hit
>
> Destination:
> pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
>
>
> reached on: Mon Aug 27 10:25:28 2007
>
>
>
> In fact, I was able to complete globus-job-run without specifying our own
> CE.
>
>
>
> -- Best Regards --
>
> Adeel-ur-Rehman
>
>
>
>
>
> ________________________________
>
>
> From: Adeel-ur-Rehman [mailto:[log in to unmask]]
> Sent: Monday, August 27, 2007 2:44 PM
> To: 'Maarten Litmaath'
> Cc: [log in to unmask]
> Subject: RE: [LCG-ROLLOUT] Job Submission Failure
>
>
>
>
>
> Dear Maarten,
>
>
>
> > I tried to submit the job using an ordinary user account (i.e. adeel)
from
>
> > UI which is only a member of dteam VO.
>
>
>
> >>On the CE you can "su" to an "sgm" account and try a qsub: does it work?
>
>
>
>
>
> Yes I have tried that successfully.
>
>
>
>
>
> > I have tested the PBS stagein functionality by running the script
attached
>
> > under a grid user account by specifying its corresponding queue name as
an
>
> > argument, I got "test successful" message.
>
>
>
> >>I suppose it was the grid account for an ordinary user, e.g. ops001?
>
> >>Try with an "sgm" account instead.
>
>
>
>
>
> That's also working fine with me.
>
>
>
> But still I am getting the same Unspecified_gridmanager_error although now
I
> can successfully complete the globus-job-run procedure with no errors.
>
>
>
> -- Best Regards --
>
> Adeel
>
>
>
>
>
>
>
> -----Original Message-----
> From: Maarten Litmaath [mailto:[log in to unmask]]
> Sent: Monday, August 27, 2007 2:19 PM
> To: Adeel-ur-Rehman
> Cc: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
>
>
>
> Hi Adeel,
>
>
>
> > I tried to submit the job using an ordinary user account (i.e. adeel)
from
>
> > UI which is only a member of dteam VO.
>
>
>
> On the CE you can "su" to an "sgm" account and try a qsub: does it work?
>
>
>
> > Regarding the reconfiguration of the CE, I only upgraded it to the
latest
>
> > available update of glite-3.1.
>
> >
>
> > Yes I checked the suggestions on the page
>
> >
> http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error
>
> >
>
> >
>
> > /var/spool/pbs/mom_logs on the WN don't state anything, so it seems that
> the
>
> > jobs are not actually executing.
>
> >
>
> > I have tested the PBS stagein functionality by running the script
attached
>
> > under a grid user account by specifying its corresponding queue name as
an
>
> > argument, I got "test successful" message.
>
>
>
> I suppose it was the grid account for an ordinary user, e.g. ops001?
>
> Try with an "sgm" account instead.
>
|