Dear Yiannis,
Here are the results:
[root@pcncp24 root]# globus-url-copy -dbg -vb
gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql file:///tmp/WN-file
debug: starting to get gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
debug: connecting to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
debug: fault on connection to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql:
globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
error: globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
We are using lcg-RB.
I am not using any proxy in the worker node.
So voms-proxy-info -all gives Couldn't find a valid proxy, rather I am using
it from UI which gives:
[pcncp21] ~ > voms-proxy-info -all
WARNING: Unable to verify signature! Server certificate possibly not
installed.
Error: VOMS extension not found!
subject :
[log in to unmask]
xy
issuer :
[log in to unmask]
identity :
[log in to unmask]
type : proxy
strength : 512 bits
path : /tmp/x509up_u503
timeleft : 9:49:05
-- Best Regards --
Adeel
-- Best Regards --
Adeel-ur-Rehman
-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
Yiannis Ioannou
Sent: Tuesday, August 28, 2007 5:50 PM
To: Adeel-ur-Rehman
Cc: LHC Computer Grid - Rollout
Subject: Re: [LCG-ROLLOUT] Job Submission Failure
Hello,
The file pcncp24.ncp.edu.pk/root/hostcert.pem should not be accessible
anyway. You should try to copy something from your worker node into
the tmp directory of the resource broker.
Are you using a wms or rb?
In the worker node, what does the voms-proxy-info -all gives?
Yiannis
On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
>
>
> Dear Yiannis,
> Here is the output for globus-url-copy between WN and RB.
>
> [root@wn01 mom_logs]# globus-url-copy -dbg -vb
> gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem file:`pwd`/RB_hostcert
> debug: starting to get gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> debug: connecting to gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:t
> 220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI wu-2.6.2
> (gcc32dbg, 1109600000-42) ready.
>
> debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:
> 530 No local mapping for Globus ID
>
> debug: fault on connection to
gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> debug: data callback, error the server sent an error response: 530 530 No
> local mapping for Globus ID
> , buffer 0xb74c8008, length 0, offset=0, eof=true
> debug: operation complete
>
> error: the server sent an error response: 530 530 No local mapping for
> Globus ID
>
>
> Thanks,
>
>
> -- Best Regards --
> Adeel
>
> -----Original Message-----
> From: LHC Computer Grid - Rollout
[mailto:[log in to unmask]]
> On Behalf Of Yiannis Ioannou
> Sent: Tuesday, August 28, 2007 4:38 PM
> To: [log in to unmask]
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Hello,
>
> > WN to RB is not practically possible as our RB is on a live IP whereas
the
> > WN is on a private IP.
>
> I don't believe that this should be a problem. Try to copy a file with
> globus-url-copy and enable the dbg option (i.e. globus-url-copy -dbg
> -vb). Post the results here.
>
>
> regards,
> Yiannis
>
>
> On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> >
> > Dear Yiannis,
> >
> > 1) I have checked all the disk sizes. They are all fine.
> > 2) I have tried to copy a file from worker node back to ce but it
couldn't
> > be done, it gives error: a system call failed (Connection refused).
> > WN to RB is not practically possible as our RB is on a live IP whereas
the
> > WN is on a private IP.
> > 3) I am only using 2 WNs for during my investigation period.
> > 4) qmgr -c "p s" |grep acl returns:
> > set queue atlas acl_group_enable = True
> > set queue atlas acl_groups = atlas
> > set queue alice acl_group_enable = True
> > set queue alice acl_groups = alice
> > set queue lhcb acl_group_enable = True
> > set queue lhcb acl_groups = lhcb
> > set queue cms acl_group_enable = True
> > set queue cms acl_groups = cms
> > set queue dteam acl_group_enable = True
> > set queue dteam acl_groups = dteam
> > set queue ops acl_group_enable = True
> > set queue ops acl_groups = ops
> > set server acl_host_enable = False
> >
> > Thanks for your reply,
> >
> > -- Best Regards --
> > Adeel
> >
> > -----Original Message-----
> > From: LHC Computer Grid - Rollout
> [mailto:[log in to unmask]]
> > On Behalf Of Yiannis Ioannou
> > Sent: Monday, August 27, 2007 4:13 PM
> > To: [log in to unmask]
> > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> >
> > Hello there,
> >
> > ->Please do the following checks:
> > - Check the available disk size of all the machines.
> > - try to copy a file from a worker node back to the ce and rb with
> > globus-url-copy
> > - locate the worker node that the job fail
> > - what does
> > qmgr -c "p s" |grep acl
> > gives?
> >
> > regards,
> > Yiannis
> >
> >
> >
> > On 8/27/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > > Dear Maarten,
> > >
> > >
> > >
> > > Sorry for the mistake.
> > >
> > >
> > >
> > > I am getting now the same error, i.e., Unspecified_gridmanager_error.
> > >
> > > I am also getting the same old behaviour from globus-job-run, i.e.:
> > >
> > >
> > >
> > > *************************************************************
> > >
> > > BOOKKEEPING INFORMATION:
> > >
> > >
> > >
> > > Status info for the Job :
> > > https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> > >
> > > Current Status: Aborted
> > >
> > > Status Reason: Job RetryCount (3) hit
> > >
> > > Destination:
> > > pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> > >
> > >
> > > reached on: Mon Aug 27 10:25:28 2007
> > >
> > >
> > >
> > > In fact, I was able to complete globus-job-run without specifying our
> own
> > > CE.
> > >
> > >
> > >
> > > -- Best Regards --
> > >
> > > Adeel-ur-Rehman
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > >
> > >
> > > From: Adeel-ur-Rehman [mailto:[log in to unmask]]
> > > Sent: Monday, August 27, 2007 2:44 PM
> > > To: 'Maarten Litmaath'
> > > Cc: [log in to unmask]
> > > Subject: RE: [LCG-ROLLOUT] Job Submission Failure
> > >
> > >
> > >
> > >
> > >
> > > Dear Maarten,
> > >
> > >
> > >
> > > > I tried to submit the job using an ordinary user account (i.e.
adeel)
> > from
> > >
> > > > UI which is only a member of dteam VO.
> > >
> > >
> > >
> > > >>On the CE you can "su" to an "sgm" account and try a qsub: does it
> work?
> > >
> > >
> > >
> > >
> > >
> > > Yes I have tried that successfully.
> > >
> > >
> > >
> > >
> > >
> > > > I have tested the PBS stagein functionality by running the script
> > attached
> > >
> > > > under a grid user account by specifying its corresponding queue name
> as
> > an
> > >
> > > > argument, I got "test successful" message.
> > >
> > >
> > >
> > > >>I suppose it was the grid account for an ordinary user, e.g. ops001?
> > >
> > > >>Try with an "sgm" account instead.
> > >
> > >
> > >
> > >
> > >
> > > That's also working fine with me.
> > >
> > >
> > >
> > > But still I am getting the same Unspecified_gridmanager_error although
> now
> > I
> > > can successfully complete the globus-job-run procedure with no errors.
> > >
> > >
> > >
> > > -- Best Regards --
> > >
> > > Adeel
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Maarten Litmaath [mailto:[log in to unmask]]
> > > Sent: Monday, August 27, 2007 2:19 PM
> > > To: Adeel-ur-Rehman
> > > Cc: [log in to unmask]
> > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > >
> > >
> > >
> > >
> > > Hi Adeel,
> > >
> > >
> > >
> > > > I tried to submit the job using an ordinary user account (i.e.
adeel)
> > from
> > >
> > > > UI which is only a member of dteam VO.
> > >
> > >
> > >
> > > On the CE you can "su" to an "sgm" account and try a qsub: does it
work?
> > >
> > >
> > >
> > > > Regarding the reconfiguration of the CE, I only upgraded it to the
> > latest
> > >
> > > > available update of glite-3.1.
> > >
> > > >
> > >
> > > > Yes I checked the suggestions on the page
> > >
> > > >
> > > http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error
> > >
> > > >
> > >
> > > >
> > >
> > > > /var/spool/pbs/mom_logs on the WN don't state anything, so it seems
> that
> > > the
> > >
> > > > jobs are not actually executing.
> > >
> > > >
> > >
> > > > I have tested the PBS stagein functionality by running the script
> > attached
> > >
> > > > under a grid user account by specifying its corresponding queue name
> as
> > an
> > >
> > > > argument, I got "test successful" message.
> > >
> > >
> > >
> > > I suppose it was the grid account for an ordinary user, e.g. ops001?
> > >
> > > Try with an "sgm" account instead.
> > >
> >
> >
>
>
|