JISCMail - LCG-ROLLOUT Archives

Email discussion lists for the UK Education and Research communities
Subscriber's Corner
Email Lists
LCG-ROLLOUT Archives

LCG-ROLLOUT@JISCMAIL.AC.UK

View:

Message:
[
First
Last
]
By Topic:
[
First
Last
]
By Author:
[
First
Last
]
Font:
Proportional Font
		LISTSERV Archives
		LCG-ROLLOUT Home
		LCG-ROLLOUT August 2007
Options

Subscribe or Unsubscribe
Get Password
Subject:
Re: Job Submission Failure
From:
Adeel-ur-Rehman <[log in to unmask]>
Reply-To:
LHC Computer Grid - Rollout <[log in to unmask]>
Date:
Tue, 28 Aug 2007 18:39:14 +0500
Content-Type:
multipart/signed
Parts/Attachments:
text/plain (455 lines) , smime.p7s (455 lines)

Dear Yianiis,

I have followed your steps and here is the output:

[root@wn01 tmp]# touch /tmp/deleteme

[root@wn01 tmp]# globus-url-copy -dbg -vb file:///tmp/deleteme
gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: starting to put gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: connecting to gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: response from gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease:
220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI wu-2.6.2
(gcc32dbg, 1109600000-42) ready.

debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: response from gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease:
530 No local mapping for Globus ID

debug: fault on connection to gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease
debug: data callback, error the server sent an error response: 530 530 No
local mapping for Globus ID
, buffer 0xb74cc008, length 0, offset=0, eof=true
debug: operation complete

error: the server sent an error response: 530 530 No local mapping for
Globus ID


Thanks,


-- Best Regards --
Adeel


-----Original Message-----
From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
Yiannis Ioannou
Sent: Tuesday, August 28, 2007 6:30 PM
To: Adeel-ur-Rehman
Cc: LHC Computer Grid - Rollout
Subject: Re: [LCG-ROLLOUT] Job Submission Failure

Hello,

Sorry, probably, I was not very specific.

At your worker node:
- Copy a proxy certificate in one of your worker node.
- export X509_USER_PROXY=THISISTHEPATHTOYOURPROXYCERTIFICATE
- copy a file as follow:
touch /tmp/deleteme
globus-url-copy -dbg -vb file:///tmp/deleteme
gsiftp://pcncp24.ncp.edu.pk/tmp/deletemeplease

...post the results.

Yiannis
On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
>
> Dear Yiannis,
>
> Here are the results:
>
> [root@pcncp24 root]#  globus-url-copy -dbg -vb
> gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql file:///tmp/WN-file
> debug: starting to get gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
> debug: connecting to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql
> debug: fault on connection to gsiftp://wn01.ncp.edu.pk/root/lcg-db.sql:
> globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
> error: globus_ftp_control_connect: globus_libc_gethostbyaddr_r failed
>
> We are using lcg-RB.
>
> I am not using any proxy in the worker node.
> So voms-proxy-info -all gives Couldn't find a valid proxy, rather I am
using
> it from UI which gives:
>
> [pcncp21] ~ > voms-proxy-info -all
> WARNING: Unable to verify signature! Server certificate possibly not
> installed.
> Error: VOMS extension not found!
> subject   :
>
[log in to unmask]
> xy
> issuer    :
> [log in to unmask]
> identity  :
> [log in to unmask]
> type      : proxy
> strength  : 512 bits
> path      : /tmp/x509up_u503
> timeleft  : 9:49:05
>
> -- Best Regards --
> Adeel
>
>
> -- Best Regards --
> Adeel-ur-Rehman
> -----Original Message-----
> From: [log in to unmask] [mailto:[log in to unmask]] On Behalf Of
> Yiannis Ioannou
> Sent: Tuesday, August 28, 2007 5:50 PM
> To: Adeel-ur-Rehman
> Cc: LHC Computer Grid - Rollout
> Subject: Re: [LCG-ROLLOUT] Job Submission Failure
>
> Hello,
>
> The file pcncp24.ncp.edu.pk/root/hostcert.pem should not be accessible
> anyway. You should try to copy something from your worker node into
> the tmp directory of the resource broker.
>
> Are you using a wms or rb?
>
> In the worker node, what does the voms-proxy-info -all gives?
>
> Yiannis
>
> On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> >
> >
> > Dear Yiannis,
> > Here is the output for globus-url-copy between WN and RB.
> >
> > [root@wn01 mom_logs]# globus-url-copy -dbg -vb
> > gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem  file:`pwd`/RB_hostcert
> > debug: starting to get gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: connecting to gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:t
> > 220 pcncp24.ncp.edu.pk GridFTP Server 1.12 GSSAPI type Globus/GSI
wu-2.6.2
> > (gcc32dbg, 1109600000-42) ready.
> >
> > debug: authenticating with gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: response from gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem:
> > 530 No local mapping for Globus ID
> >
> > debug: fault on connection to
> gsiftp://pcncp24.ncp.edu.pk/root/hostcert.pem
> > debug: data callback, error the server sent an error response: 530 530
No
> > local mapping for Globus ID
> > , buffer 0xb74c8008, length 0, offset=0, eof=true
> > debug: operation complete
> >
> > error: the server sent an error response: 530 530 No local mapping for
> > Globus ID
> >
> >
> > Thanks,
> >
> >
> > -- Best Regards --
> > Adeel
> >
> > -----Original Message-----
> > From: LHC Computer Grid - Rollout
> [mailto:[log in to unmask]]
> > On Behalf Of Yiannis Ioannou
> > Sent: Tuesday, August 28, 2007 4:38 PM
> > To: [log in to unmask]
> > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> >
> > Hello,
> >
> > > WN to RB is not practically possible as our RB is on a live IP whereas
> the
> > > WN is on a private IP.
> >
> > I don't believe that this should be a problem. Try to copy a file with
> > globus-url-copy and enable the dbg option (i.e. globus-url-copy -dbg
> > -vb). Post the results here.
> >
> >
> > regards,
> > Yiannis
> >
> >
> > On 8/28/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > >
> > > Dear Yiannis,
> > >
> > > 1) I have checked all the disk sizes. They are all fine.
> > > 2) I have tried to copy a file from worker node back to ce but it
> couldn't
> > > be done, it gives error: a system call failed (Connection refused).
> > > WN to RB is not practically possible as our RB is on a live IP whereas
> the
> > > WN is on a private IP.
> > > 3) I am only using 2 WNs for during my investigation period.
> > > 4) qmgr -c "p s" |grep acl returns:
> > > set queue atlas acl_group_enable = True
> > > set queue atlas acl_groups = atlas
> > > set queue alice acl_group_enable = True
> > > set queue alice acl_groups = alice
> > > set queue lhcb acl_group_enable = True
> > > set queue lhcb acl_groups = lhcb
> > > set queue cms acl_group_enable = True
> > > set queue cms acl_groups = cms
> > > set queue dteam acl_group_enable = True
> > > set queue dteam acl_groups = dteam
> > > set queue ops acl_group_enable = True
> > > set queue ops acl_groups = ops
> > > set server acl_host_enable = False
> > >
> > > Thanks for your reply,
> > >
> > > -- Best Regards --
> > > Adeel
> > >
> > > -----Original Message-----
> > > From: LHC Computer Grid - Rollout
> > [mailto:[log in to unmask]]
> > > On Behalf Of Yiannis Ioannou
> > > Sent: Monday, August 27, 2007 4:13 PM
> > > To: [log in to unmask]
> > > Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > >
> > > Hello there,
> > >
> > > ->Please do the following checks:
> > > - Check the available disk size of all the machines.
> > > - try to copy a file from a worker node back to the ce and rb with
> > > globus-url-copy
> > > - locate the worker node that the job fail
> > > - what does
> > >  qmgr -c "p s" |grep acl
> > > gives?
> > >
> > > regards,
> > > Yiannis
> > >
> > >
> > >
> > > On 8/27/07, Adeel-ur-Rehman <[log in to unmask]> wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Dear Maarten,
> > > >
> > > >
> > > >
> > > > Sorry for the mistake.
> > > >
> > > >
> > > >
> > > > I am getting now the same error, i.e.,
Unspecified_gridmanager_error.
> > > >
> > > > I am also getting the same old behaviour from globus-job-run, i.e.:
> > > >
> > > >
> > > >
> > > > *************************************************************
> > > >
> > > > BOOKKEEPING INFORMATION:
> > > >
> > > >
> > > >
> > > > Status info for the Job :
> > > > https://pcncp24.ncp.edu.pk:9000/V_vK6voweHl3stwItI9gbw
> > > >
> > > > Current Status:     Aborted
> > > >
> > > > Status Reason:      Job RetryCount (3) hit
> > > >
> > > > Destination:
> > > > pcncp04.ncp.edu.pk:2119/jobmanager-lcgpbs-dteam
> > > >
> > > >
> > > > reached on:         Mon Aug 27 10:25:28 2007
> > > >
> > > >
> > > >
> > > > In fact, I was able to complete globus-job-run without specifying
our
> > own
> > > > CE.
> > > >
> > > >
> > > >
> > > > -- Best Regards --
> > > >
> > > > Adeel-ur-Rehman
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >  ________________________________
> > > >
> > > >
> > > > From: Adeel-ur-Rehman [mailto:[log in to unmask]]
> > > >  Sent: Monday, August 27, 2007 2:44 PM
> > > >  To: 'Maarten Litmaath'
> > > >  Cc: [log in to unmask]
> > > >  Subject: RE: [LCG-ROLLOUT] Job Submission Failure
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Dear Maarten,
> > > >
> > > >
> > > >
> > > > > I tried to submit the job using an ordinary user account (i.e.
> adeel)
> > > from
> > > >
> > > > > UI which is only a member of dteam VO.
> > > >
> > > >
> > > >
> > > > >>On the CE you can "su" to an "sgm" account and try a qsub: does it
> > work?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Yes I have tried that successfully.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > I have tested the PBS stagein functionality by running the script
> > > attached
> > > >
> > > > > under a grid user account by specifying its corresponding queue
name
> > as
> > > an
> > > >
> > > > > argument, I got "test successful" message.
> > > >
> > > >
> > > >
> > > > >>I suppose it was the grid account for an ordinary user, e.g.
ops001?
> > > >
> > > > >>Try with an "sgm" account instead.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > That's also working fine with me.
> > > >
> > > >
> > > >
> > > > But still I am getting the same Unspecified_gridmanager_error
although
> > now
> > > I
> > > > can successfully complete the globus-job-run procedure with no
errors.
> > > >
> > > >
> > > >
> > > > -- Best Regards --
> > > >
> > > > Adeel
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > >  From: Maarten Litmaath [mailto:[log in to unmask]]
> > > >  Sent: Monday, August 27, 2007 2:19 PM
> > > >  To: Adeel-ur-Rehman
> > > >  Cc: [log in to unmask]
> > > >  Subject: Re: [LCG-ROLLOUT] Job Submission Failure
> > > >
> > > >
> > > >
> > > >
> > > > Hi Adeel,
> > > >
> > > >
> > > >
> > > > > I tried to submit the job using an ordinary user account (i.e.
> adeel)
> > > from
> > > >
> > > > > UI which is only a member of dteam VO.
> > > >
> > > >
> > > >
> > > > On the CE you can "su" to an "sgm" account and try a qsub: does it
> work?
> > > >
> > > >
> > > >
> > > > > Regarding the reconfiguration of the CE, I only upgraded it to the
> > > latest
> > > >
> > > > > available update of glite-3.1.
> > > >
> > > > >
> > > >
> > > > > Yes I checked the suggestions on the page
> > > >
> > > > >
> > > > http://goc.grid.sinica.edu.tw/gocwiki/Unspecified_gridmanager_error
> > > >
> > > > >
> > > >
> > > > >
> > > >
> > > > > /var/spool/pbs/mom_logs on the WN don't state anything, so it
seems
> > that
> > > > the
> > > >
> > > > > jobs are not actually executing.
> > > >
> > > > >
> > > >
> > > > > I have tested the PBS stagein functionality by running the script
> > > attached
> > > >
> > > > > under a grid user account by specifying its corresponding queue
name
> > as
> > > an
> > > >
> > > > > argument, I got "test successful" message.
> > > >
> > > >
> > > >
> > > > I suppose it was the grid account for an ordinary user, e.g. ops001?
> > > >
> > > > Try with an "sgm" account instead.
> > > >
> > >
> > >
> >
> >
>
>
Top of Message | Previous Page | Permalink
JiscMail Tools

Files Area | help
RSS Feeds and Sharing

Search Archives

Advanced Options