Hello,
That's a problem that you should deal with. The worker nodes can not
write the output to or take the input files from the resource broker.
Try to copy a file from a wn (using again globus-url-copy) to your CE
in order to determine whether this comprises an RB problem or a CE
problem, and let me know for the results.
Yiannis
On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
>
> Dear Yianiis,
>
> I have followed your steps and here is the output:
>
> [root@wn01 tmp]# touch /tmp/deleteme
>
> [root@wn01 tmp]# globus-url-copy -dbg -vb file:///tmp/deleteme
> gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
> debug: starting to put gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
> debug: connecting to gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
> debug: response from gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease:
> 220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI wu-2.6.2
> (gcc32dbg, 1109600000-42) ready.
>
> debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
> debug: response from gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease:
> 530 No local mapping for Globus ID
>
> debug: fault on connection to gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
> debug: data callback, error the server sent an error response: 530 530 No
> local mapping for Globus ID
> , buffer 0xb74cc008, length 0, offset=0, eof=true
> debug: operation complete
>
> error: the server sent an error response: 530 530 No local mapping for
> Globus ID
>
>
> Thanks,
>
>
> -- Best Regards --
> Adeel
>
>
> -----Original Message-----
> From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
> Yiannis Ioannou
> Sent: Tuesday, August 28, 2007 6:30 PM
> To: Adeel-ur-Rehman
> Cc: LHC Computer Grid - Rollout
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Hello,
>
> Sorry, probably, I was not very specific.
>
> At your worker node:
> - Copy a proxy certificate in one of your worker node.
> - export X509_USER_PROXY=THISISTHEPATHTOYOURPROXYCERTIFICATE
> - copy a file as follow:
> touch /tmp/deleteme
> globus-url-copy -dbg -vb file:///tmp/deleteme
> gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
>
> ...post the results.
>
> Yiannis
> On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> >
> > Dear Yiannis,
> >
> > Here are the results:
> >
> > [root@pcncp24 root]# globus-url-copy -dbg -vb
> > gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql file:///tmp/WN-file
> > debug: starting to get gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
> > debug: connecting to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
> > debug: fault on connection to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql:
> > globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
> > error: globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
> >
> > We are using lcg-RB.
> >
> > I am not using any proxy in the worker node.
> > So voms-proxy-info -all gives Couldn't find a valid proxy, rather I am
> using
> > it from UI which gives:
> >
> > [pcncp21] ~ > voms-proxy-info -all
> > WARNING: Unable to verify signature! Server certificate possibly not
> > installed.
> > Error: VOMS extension not found!
> > subject :
> >
> [log in to unmask]
> > xy
> > issuer :
> > [log in to unmask]
> > identity :
> > [log in to unmask]
> > type : proxy
> > strength : 512 bits
> > path : /tmp/x509up_u503
> > timeleft : 9:49:05
> >
> > -- Best Regards --
> > Adeel
> >
> >
> > -- Best Regards --
> > Adeel-ur-Rehman
> > -----Original Message-----
> > From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
> > Yiannis Ioannou
> > Sent: Tuesday, August 28, 2007 5:50 PM
> > To: Adeel-ur-Rehman
> > Cc: LHC Computer Grid - Rollout
> > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> >
> > Hello,
> >
> > The file pcncp24.ncp.edu.pk/root/hostcert.pem should not be accessible
> > anyway. You should try to copy something from your worker node into
> > the tmp directory of the resource broker.
> >
> > Are you using a wms or rb?
> >
> > In the worker node, what does the voms-proxy-info -all gives?
> >
> > Yiannis
> >
> > On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > >
> > >
> > > Dear Yiannis,
> > > Here is the output for globus-url-copy between WN and RB.
> > >
> > > [root@wn01 mom_logs]# globus-url-copy -dbg -vb
> > > gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem file:`pwd`/RB_hostcert
> > > debug: starting to get gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > > debug: connecting to gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > > debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:t
> > > 220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI
> wu-2.6.2
> > > (gcc32dbg, 1109600000-42) ready.
> > >
> > > debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > > debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:
> > > 530 No local mapping for Globus ID
> > >
> > > debug: fault on connection to
> > gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > > debug: data callback, error the server sent an error response: 530 530
> No
> > > local mapping for Globus ID
> > > , buffer 0xb74c8008, length 0, offset=0, eof=true
> > > debug: operation complete
> > >
> > > error: the server sent an error response: 530 530 No local mapping for
> > > Globus ID
> > >
> > >
> > > Thanks,
> > >
> > >
> > > -- Best Regards --
> > > Adeel
> > >
> > > -----Original Message-----
> > > From: LHC Computer Grid - Rollout
> > [mailto:[log in to unmask]]
> > > On Behalf Of Yiannis Ioannou
> > > Sent: Tuesday, August 28, 2007 4:38 PM
> > > To: [log in to unmask]
> > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > >
> > > Hello,
> > >
> > > > WN to RB is not practically possible as our RB is on a live IP whereas
> > the
> > > > WN is on a private IP.
> > >
> > > I don't believe that this should be a problem. Try to copy a file with
> > > globus-url-copy and enable the dbg option (i.e. globus-url-copy -dbg
> > > -vb). Post the results here.
> > >
> > >
> > > regards,
> > > Yiannis
> > >
> > >
> > > On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > > >
> > > > Dear Yiannis,
> > > >
> > > > 1) I have checked all the disk sizes. They are all fine.
> > > > 2) I have tried to copy a file from worker node back to ce but it
> > couldn't
> > > > be done, it gives error: a system call failed (Connection refused).
> > > > WN to RB is not practically possible as our RB is on a live IP whereas
> > the
> > > > WN is on a private IP.
> > > > 3) I am only using 2 WNs for during my investigation period.
> > > > 4) qmgr -c "p s" |grep acl returns:
> > > > set queue atlas acl_group_enable = True
> > > > set queue atlas acl_groups = atlas
> > > > set queue alice acl_group_enable = True
> > > > set queue alice acl_groups = alice
> > > > set queue lhcb acl_group_enable = True
> > > > set queue lhcb acl_groups = lhcb
> > > > set queue cms acl_group_enable = True
> > > > set queue cms acl_groups = cms
> > > > set queue dteam acl_group_enable = True
> > > > set queue dteam acl_groups = dteam
> > > > set queue ops acl_group_enable = True
> > > > set queue ops acl_groups = ops
> > > > set server acl_host_enable = False
> > > >
> > > > Thanks for your reply,
> > > >
> > > > -- Best Regards --
> > > > Adeel
> > > >
> > > > -----Original Message-----
> > > > From: LHC Computer Grid - Rollout
> > > [mailto:[log in to unmask]]
> > > > On Behalf Of Yiannis Ioannou
> > > > Sent: Monday, August 27, 2007 4:13 PM
> > > > To: [log in to unmask]
> > > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > > >
> > > > Hello there,
> > > >
> > > > ->Please do the following checks:
> > > > - Check the available disk size of all the machines.
> > > > - try to copy a file from a worker node back to the ce and rb with
> > > > globus-url-copy
> > > > - locate the worker node that the job fail
> > > > - what does
> > > > qmgr -c "p s" |grep acl
> > > > gives?
> > > >
> > > > regards,
> > > > Yiannis
> > > >
> > > >
> > > >
> > > > On 8/27/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Dear Maarten,
> > > > >
> > > > >
> > > > >
> > > > > Sorry for the mistake.
> > > > >
> > > > >
> > > > >
> > > > > I am getting now the same error, i.e.,
> Unspecified_gridmanager_error.
> > > > >
> > > > > I am also getting the same old behaviour from globus-job-run, i.e.:
> > > > >
> > > > >
> > > > >
> > > > > *************************************************************
> > > > >
> > > > > BOOKKEEPING INFORMATION:
> > > > >
> > > > >
> > > > >
> > > > > Status info for the Job :
> > > > > https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> > > > >
> > > > > Current Status: Aborted
> > > > >
> > > > > Status Reason: Job RetryCount (3) hit
> > > > >
> > > > > Destination:
> > > > > pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> > > > >
> > > > >
> > > > > reached on: Mon Aug 27 10:25:28 2007
> > > > >
> > > > >
> > > > >
> > > > > In fact, I was able to complete globus-job-run without specifying
> our
> > > own
> > > > > CE.
> > > > >
> > > > >
> > > > >
> > > > > -- Best Regards --
> > > > >
> > > > > Adeel-ur-Rehman
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ________________________________
> > > > >
> > > > >
> > > > > From: Adeel-ur-Rehman [mailto:[log in to unmask]]
> > > > > Sent: Monday, August 27, 2007 2:44 PM
> > > > > To: 'Maarten Litmaath'
> > > > > Cc: [log in to unmask]
> > > > > Subject: RE: [LCG-ROLLOUT] Job Submission Failure
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Dear Maarten,
> > > > >
> > > > >
> > > > >
> > > > > > I tried to submit the job using an ordinary user account (i.e.
> > adeel)
> > > > from
> > > > >
> > > > > > UI which is only a member of dteam VO.
> > > > >
> > > > >
> > > > >
> > > > > >>On the CE you can "su" to an "sgm" account and try a qsub: does it
> > > work?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Yes I have tried that successfully.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > I have tested the PBS stagein functionality by running the script
> > > > attached
> > > > >
> > > > > > under a grid user account by specifying its corresponding queue
> name
> > > as
> > > > an
> > > > >
> > > > > > argument, I got "test successful" message.
> > > > >
> > > > >
> > > > >
> > > > > >>I suppose it was the grid account for an ordinary user, e.g.
> ops001?
> > > > >
> > > > > >>Try with an "sgm" account instead.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > That's also working fine with me.
> > > > >
> > > > >
> > > > >
> > > > > But still I am getting the same Unspecified_gridmanager_error
> although
> > > now
> > > > I
> > > > > can successfully complete the globus-job-run procedure with no
> errors.
> > > > >
> > > > >
> > > > >
> > > > > -- Best Regards --
> > > > >
> > > > > Adeel
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Maarten Litmaath [mailto:[log in to unmask]]
> > > > > Sent: Monday, August 27, 2007 2:19 PM
> > > > > To: Adeel-ur-Rehman
> > > > > Cc: [log in to unmask]
> > > > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Hi Adeel,
> > > > >
> > > > >
> > > > >
> > > > > > I tried to submit the job using an ordinary user account (i.e.
> > adeel)
> > > > from
> > > > >
> > > > > > UI which is only a member of dteam VO.
> > > > >
> > > > >
> > > > >
> > > > > On the CE you can "su" to an "sgm" account and try a qsub: does it
> > work?
> > > > >
> > > > >
> > > > >
> > > > > > Regarding the reconfiguration of the CE, I only upgraded it to the
> > > > latest
> > > > >
> > > > > > available update of glite-3.1.
> > > > >
> > > > > >
> > > > >
> > > > > > Yes I checked the suggestions on the page
> > > > >
> > > > > >
> > > > > http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error
> > > > >
> > > > > >
> > > > >
> > > > > >
> > > > >
> > > > > > /var/spool/pbs/mom_logs on the WN don't state anything, so it
> seems
> > > that
> > > > > the
> > > > >
> > > > > > jobs are not actually executing.
> > > > >
> > > > > >
> > > > >
> > > > > > I have tested the PBS stagein functionality by running the script
> > > > attached
> > > > >
> > > > > > under a grid user account by specifying its corresponding queue
> name
> > > as
> > > > an
> > > > >
> > > > > > argument, I got "test successful" message.
> > > > >
> > > > >
> > > > >
> > > > > I suppose it was the grid account for an ordinary user, e.g. ops001?
> > > > >
> > > > > Try with an "sgm" account instead.
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
|