Dear Yianiis,
I have followed your steps and here is the output:
[root@wn01 tmp]# touch /tmp/deleteme
[root@wn01 tmp]# globus-url-copy -dbg -vb file:///tmp/deleteme
gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: starting to put gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: connecting to gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: response from gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease:
220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI wu-2.6.2
(gcc32dbg, 1109600000-42) ready.
debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: response from gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease:
530 No local mapping for Globus ID
debug: fault on connection to gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: data callback, error the server sent an error response: 530 530 No
local mapping for Globus ID
, buffer 0xb74cc008, length 0, offset=0, eof=true
debug: operation complete
error: the server sent an error response: 530 530 No local mapping for
Globus ID
Thanks,
-- Best Regards --
Adeel
-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
Yiannis Ioannou
Sent: Tuesday, August 28, 2007 6:30 PM
To: Adeel-ur-Rehman
Cc: LHC Computer Grid - Rollout
Subject: Re: [LCG-ROLLOUT] Job Submission Failure
Hello,
Sorry, probably, I was not very specific.
At your worker node:
- Copy a proxy certificate in one of your worker node.
- export X509_USER_PROXY=THISISTHEPATHTOYOURPROXYCERTIFICATE
- copy a file as follow:
touch /tmp/deleteme
globus-url-copy -dbg -vb file:///tmp/deleteme
gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
...post the results.
Yiannis
On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
>
> Dear Yiannis,
>
> Here are the results:
>
> [root@pcncp24 root]# globus-url-copy -dbg -vb
> gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql file:///tmp/WN-file
> debug: starting to get gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
> debug: connecting to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
> debug: fault on connection to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql:
> globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
> error: globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
>
> We are using lcg-RB.
>
> I am not using any proxy in the worker node.
> So voms-proxy-info -all gives Couldn't find a valid proxy, rather I am
using
> it from UI which gives:
>
> [pcncp21] ~ > voms-proxy-info -all
> WARNING: Unable to verify signature! Server certificate possibly not
> installed.
> Error: VOMS extension not found!
> subject :
>
[log in to unmask]
> xy
> issuer :
> [log in to unmask]
> identity :
> [log in to unmask]
> type : proxy
> strength : 512 bits
> path : /tmp/x509up_u503
> timeleft : 9:49:05
>
> -- Best Regards --
> Adeel
>
>
> -- Best Regards --
> Adeel-ur-Rehman
> -----Original Message-----
> From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
> Yiannis Ioannou
> Sent: Tuesday, August 28, 2007 5:50 PM
> To: Adeel-ur-Rehman
> Cc: LHC Computer Grid - Rollout
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Hello,
>
> The file pcncp24.ncp.edu.pk/root/hostcert.pem should not be accessible
> anyway. You should try to copy something from your worker node into
> the tmp directory of the resource broker.
>
> Are you using a wms or rb?
>
> In the worker node, what does the voms-proxy-info -all gives?
>
> Yiannis
>
> On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> >
> >
> > Dear Yiannis,
> > Here is the output for globus-url-copy between WN and RB.
> >
> > [root@wn01 mom_logs]# globus-url-copy -dbg -vb
> > gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem file:`pwd`/RB_hostcert
> > debug: starting to get gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: connecting to gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:t
> > 220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI
wu-2.6.2
> > (gcc32dbg, 1109600000-42) ready.
> >
> > debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:
> > 530 No local mapping for Globus ID
> >
> > debug: fault on connection to
> gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: data callback, error the server sent an error response: 530 530
No
> > local mapping for Globus ID
> > , buffer 0xb74c8008, length 0, offset=0, eof=true
> > debug: operation complete
> >
> > error: the server sent an error response: 530 530 No local mapping for
> > Globus ID
> >
> >
> > Thanks,
> >
> >
> > -- Best Regards --
> > Adeel
> >
> > -----Original Message-----
> > From: LHC Computer Grid - Rollout
> [mailto:[log in to unmask]]
> > On Behalf Of Yiannis Ioannou
> > Sent: Tuesday, August 28, 2007 4:38 PM
> > To: [log in to unmask]
> > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> >
> > Hello,
> >
> > > WN to RB is not practically possible as our RB is on a live IP whereas
> the
> > > WN is on a private IP.
> >
> > I don't believe that this should be a problem. Try to copy a file with
> > globus-url-copy and enable the dbg option (i.e. globus-url-copy -dbg
> > -vb). Post the results here.
> >
> >
> > regards,
> > Yiannis
> >
> >
> > On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > >
> > > Dear Yiannis,
> > >
> > > 1) I have checked all the disk sizes. They are all fine.
> > > 2) I have tried to copy a file from worker node back to ce but it
> couldn't
> > > be done, it gives error: a system call failed (Connection refused).
> > > WN to RB is not practically possible as our RB is on a live IP whereas
> the
> > > WN is on a private IP.
> > > 3) I am only using 2 WNs for during my investigation period.
> > > 4) qmgr -c "p s" |grep acl returns:
> > > set queue atlas acl_group_enable = True
> > > set queue atlas acl_groups = atlas
> > > set queue alice acl_group_enable = True
> > > set queue alice acl_groups = alice
> > > set queue lhcb acl_group_enable = True
> > > set queue lhcb acl_groups = lhcb
> > > set queue cms acl_group_enable = True
> > > set queue cms acl_groups = cms
> > > set queue dteam acl_group_enable = True
> > > set queue dteam acl_groups = dteam
> > > set queue ops acl_group_enable = True
> > > set queue ops acl_groups = ops
> > > set server acl_host_enable = False
> > >
> > > Thanks for your reply,
> > >
> > > -- Best Regards --
> > > Adeel
> > >
> > > -----Original Message-----
> > > From: LHC Computer Grid - Rollout
> > [mailto:[log in to unmask]]
> > > On Behalf Of Yiannis Ioannou
> > > Sent: Monday, August 27, 2007 4:13 PM
> > > To: [log in to unmask]
> > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > >
> > > Hello there,
> > >
> > > ->Please do the following checks:
> > > - Check the available disk size of all the machines.
> > > - try to copy a file from a worker node back to the ce and rb with
> > > globus-url-copy
> > > - locate the worker node that the job fail
> > > - what does
> > > qmgr -c "p s" |grep acl
> > > gives?
> > >
> > > regards,
> > > Yiannis
> > >
> > >
> > >
> > > On 8/27/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Dear Maarten,
> > > >
> > > >
> > > >
> > > > Sorry for the mistake.
> > > >
> > > >
> > > >
> > > > I am getting now the same error, i.e.,
Unspecified_gridmanager_error.
> > > >
> > > > I am also getting the same old behaviour from globus-job-run, i.e.:
> > > >
> > > >
> > > >
> > > > *************************************************************
> > > >
> > > > BOOKKEEPING INFORMATION:
> > > >
> > > >
> > > >
> > > > Status info for the Job :
> > > > https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> > > >
> > > > Current Status: Aborted
> > > >
> > > > Status Reason: Job RetryCount (3) hit
> > > >
> > > > Destination:
> > > > pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> > > >
> > > >
> > > > reached on: Mon Aug 27 10:25:28 2007
> > > >
> > > >
> > > >
> > > > In fact, I was able to complete globus-job-run without specifying
our
> > own
> > > > CE.
> > > >
> > > >
> > > >
> > > > -- Best Regards --
> > > >
> > > > Adeel-ur-Rehman
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
> > > >
> > > >
> > > > From: Adeel-ur-Rehman [mailto:[log in to unmask]]
> > > > Sent: Monday, August 27, 2007 2:44 PM
> > > > To: 'Maarten Litmaath'
> > > > Cc: [log in to unmask]
> > > > Subject: RE: [LCG-ROLLOUT] Job Submission Failure
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Dear Maarten,
> > > >
> > > >
> > > >
> > > > > I tried to submit the job using an ordinary user account (i.e.
> adeel)
> > > from
> > > >
> > > > > UI which is only a member of dteam VO.
> > > >
> > > >
> > > >
> > > > >>On the CE you can "su" to an "sgm" account and try a qsub: does it
> > work?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Yes I have tried that successfully.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > I have tested the PBS stagein functionality by running the script
> > > attached
> > > >
> > > > > under a grid user account by specifying its corresponding queue
name
> > as
> > > an
> > > >
> > > > > argument, I got "test successful" message.
> > > >
> > > >
> > > >
> > > > >>I suppose it was the grid account for an ordinary user, e.g.
ops001?
> > > >
> > > > >>Try with an "sgm" account instead.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > That's also working fine with me.
> > > >
> > > >
> > > >
> > > > But still I am getting the same Unspecified_gridmanager_error
although
> > now
> > > I
> > > > can successfully complete the globus-job-run procedure with no
errors.
> > > >
> > > >
> > > >
> > > > -- Best Regards --
> > > >
> > > > Adeel
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Maarten Litmaath [mailto:[log in to unmask]]
> > > > Sent: Monday, August 27, 2007 2:19 PM
> > > > To: Adeel-ur-Rehman
> > > > Cc: [log in to unmask]
> > > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > > >
> > > >
> > > >
> > > >
> > > > Hi Adeel,
> > > >
> > > >
> > > >
> > > > > I tried to submit the job using an ordinary user account (i.e.
> adeel)
> > > from
> > > >
> > > > > UI which is only a member of dteam VO.
> > > >
> > > >
> > > >
> > > > On the CE you can "su" to an "sgm" account and try a qsub: does it
> work?
> > > >
> > > >
> > > >
> > > > > Regarding the reconfiguration of the CE, I only upgraded it to the
> > > latest
> > > >
> > > > > available update of glite-3.1.
> > > >
> > > > >
> > > >
> > > > > Yes I checked the suggestions on the page
> > > >
> > > > >
> > > > http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > > /var/spool/pbs/mom_logs on the WN don't state anything, so it
seems
> > that
> > > > the
> > > >
> > > > > jobs are not actually executing.
> > > >
> > > > >
> > > >
> > > > > I have tested the PBS stagein functionality by running the script
> > > attached
> > > >
> > > > > under a grid user account by specifying its corresponding queue
name
> > as
> > > an
> > > >
> > > > > argument, I got "test successful" message.
> > > >
> > > >
> > > >
> > > > I suppose it was the grid account for an ordinary user, e.g. ops001?
> > > >
> > > > Try with an "sgm" account instead.
> > > >
> > >
> > >
> >
> >
>
>
|