Print

Print


Hmmmm... I just tried and for every single WMS (Imperial, RAL, Spain) I get:


Logged Reason(s):
    - Transfer to CREAM failed due to exception: Failed to create a delegation id for job https://wmslb02.grid.hep.ph.ic.ac.uk:9000/W5nVc7c1_9FkkXQHRpqV6g: reason is Received NULL fault; the error is due to another cause: FaultString=[storeLimitedDelegationProxy error [id='13560033922E387736wms022Egrid2Ehep2Eph2Eic2Eac2Euk'; rfc=false; dn='CN_daniela_bauer_L_Physics_OU_Imperial_O_eScience_C_UK'; localUser='t2k004'; vo='t2k.org'; startTime='12/20/12 11:31 AM (GMT)'; expirationTime='12/21/12 11:32 AM (GMT)'];: sudo: sorry, you must have a tty to run sudo


t2k004 is my t2k ID at Lancaster.

Could you have words with your sudoers file ?

Cheers,




On 20 December 2012 11:32, Matt Doidge <[log in to unmask]> wrote:
Hello again,


I'll get back to Jon P to see if he sees any changes, along with Ewan's
suggestion of trying direct job submission.

Jon did a lot of investigating last night into the t2k problems at Lancaster, and found that direct job submission to our CE worked, as well as WMS submission from some WMSes. Other failed with the same failure mode (https://ggus.eu/ws/ticket_info.php?ticket=88628).

The successful WMSi were:
lcgwms02 & lcgwms03.gridpp.ac.uk at RAL

The unsuccessful WMS jobs went through:
wms01.grid.hep.ph.ic.ac.uk &  wms02.grid.hep.ph.ic.ac.uk (and  lcgwms04 at RAL).

The failed jobs all seemed to abort with the same long, authenticationy looking error message detailed previously: "error: globus_ftp_client: the server responded with an error500 500 .... Unable to open file ...Cannot move ISB".

I don't know if there's anything significant that these WMS have in common? It could be that the gubbins that control the interaction between abaddon.hec.lancs.ac.uk and these WMS has got into a bad state.

Any ideas appreciated!
Thanks,
Matt



Cheers,
Matt
P.S. Chris will be pleased that I had a bash at fixing ngs.ac.uk access
on our CE :-)

On 12/19/2012 11:07 AM, Christopher J. Walker wrote:
On 19/12/12 10:56, Daniela Bauer wrote:
Hi Matt,

check your CE:

lx05:~ :~] voms-proxy-init -valid 24:00 --voms t2k.org <http://t2k.org>
Enter GRID pass phrase:
Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=daniela bauer
Creating temporary proxy
.................................................................. Done
Contacting voms.gridpp.ac.uk:15003 <http://voms.gridpp.ac.uk:15003>
[/C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk
<http://voms.gridpp.ac.uk>] "t2k.org <http://t2k.org>" Done
Creating proxy
........................................................................................................................


Done
Your proxy is valid until Thu Dec 20 10:51:18 2012

lx05:~ :~] uberftp abaddon.hec.lancs.ac.uk
<http://abaddon.hec.lancs.ac.uk>
220 abaddon.hec.lancs.ac.uk <http://abaddon.hec.lancs.ac.uk> GridFTP
Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.
530-Login incorrect. : globus_gss_assist: Error invoking callout
530-globus_callout_module: The callout returned an error
530-an unknown error occurred
530 End.

But:
lx05:~ :~] voms-proxy-init -valid 24:00 --voms dteam
Enter GRID pass phrase:
Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=daniela bauer
Creating temporary proxy ....................................... Done
Contacting voms2.hellasgrid.gr:15004 <http://voms2.hellasgrid.gr:15004>
[/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
<http://hellasgrid.gr/CN=voms2.hellasgrid.gr>] "dteam" Failed

Error: Error during SSL handshake:

Trying next server for dteam.
Creating temporary proxy ....................... Done
Contacting voms.hellasgrid.gr:15004 <http://voms.hellasgrid.gr:15004>
[/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
<http://hellasgrid.gr/CN=voms.hellasgrid.gr>] "dteam" Done
Creating proxy .......................... Done
Your proxy is valid until Thu Dec 20 10:52:02 2012


lx05:~ :~] uberftp abaddon.hec.lancs.ac.uk
<http://abaddon.hec.lancs.ac.uk>
220 abaddon.hec.lancs.ac.uk <http://abaddon.hec.lancs.ac.uk> GridFTP
Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.
230 User dteam167 logged in.

The content of vomsdir looks fine to me, but obviously I can't see any
of your more subtle configuration issues (are you using Argus?)


Trying as ngs.ac.uk, that doesn't work either (and you might as well fix
it while you are fiddling)

walker@heppc300:~/grid/ngs$ uberftp abaddon.hec.lancs.ac.uk
220 abaddon.hec.lancs.ac.uk GridFTP Server 6.10 (gcc64, 1334324800-83)
[Globus Toolkit 5.2.0] ready.
530-Login incorrect. : globus_gss_assist: Error invoking callout
530-globus_callout_module: The callout returned an error
530-an unknown error occurred
530 End.
walker@heppc300:~/grid/ngs$ voms-proxy-info --all
subject : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
walker/CN=proxy
issuer : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
walker
identity : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
walker
type : proxy
strength : 1024 bits
path : /tmp/x509up_u32184
timeleft : 11:31:39
=== VO ngs.ac.uk extension information ===
VO : ngs.ac.uk
subject : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
walker
issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk
attribute : /ngs.ac.uk/Role=NULL/Capability=NULL
timeleft : 11:31:39
uri : voms.gridpp.ac.uk:15010


Looking at your that machine:

walker@heppc300:~/grid/ngs$ uberftp abaddon.hec.lancs.ac.uk
220 abaddon.hec.lancs.ac.uk GridFTP Server 6.10 (gcc64, 1334324800-83)
[Globus Toolkit 5.2.0] ready.
230 User dteam116 logged in.
uberftp> cd /etc/grid-security/vomsdir/ngs.ac.uk
uberftp> ls
-rw-r--r-- 1 root root 64 Sep 13 13:00 15010.lsc
-rw-r--r-- 1 root root 148 Sep 13 13:25 voms.ngs.ac.uk.lsc
uberftp> cat 15010.lsc
ngs.ac.uk
/C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B


I suspect you are missing a field in your site-info.def (or have an
extra one) for the ngs VO.

No, this doesn't help with t2k.org I'm afraid.

Chris

Cheers,
Daniela


On 19 December 2012 10:47, Matt Doidge <[log in to unmask]
<mailto:[log in to unmask]uk>> wrote:

Hello all, I hope I caught some of you before you headed off for the
holidays!

Lancaster has been trying to get T2K working on our clusters, and on
our occasionally quirky shared cluster T2K are consistently failing
to successfully submit jobs via the WMS (well technically submission
works, the jobs get aborted), with an incredibly verbose error
message (replicated below, you can also see it in the ticket
https://ggus.eu/ws/ticket_ info.php?ticket=88628
<https://ggus.eu/ws/ticket_info.php?ticket=88628>).

The error message looks like either an authentication, permissions
or missing destination problem - but I've checked our CE and
everything seems okay. As a test I asked Jon to uberftp into our CE,
and he did so without problem as an sgmt2k user.

I'm a little stuck, and would appreciate someone who speaks
glite-wms-job-status error message to take a look and maybe pinpoint
where in the chain things are breaking. I've learnt the hard way
that the CREAM/WMS interaction is quite complex, and I'm wondering
if this is one of the cases where this has screwed up (the two have
become "out of sync" somehow).

Thanks in advance, and Merry Christmas!
Matt

======================= glite-wms-job-status Success
=====================
BOOKKEEPING INFORMATION:

Status info for the Job : https://lcglb04.gridpp.rl.ac.
uk:9000/31MHBsdtFOj7AFEY7rs-lg
<https://lcglb04.gridpp.rl.ac.uk:9000/31MHBsdtFOj7AFEY7rs-lg>
Current Status: Aborted
Logged Reason(s):
- Cannot move ISB (retry_copy ${globus_transfer_cmd}
gsiftp://lcgwms02.gridpp.rl. ac.uk:2811/var/SandboxDir/31/
https_3a_2f_2flcglb04.gridpp. rl.ac.uk_3a9000_
2f31MHBsdtFOj7AFEY7rs-lg/ input/pexpect.py
<http://lcgwms02.gridpp.rl.ac.uk:2811/var/SandboxDir/31/https_3a_2f_2flcglb04.gridpp.rl.ac.uk_3a9000_2f31MHBsdtFOj7AFEY7rs-lg/input/pexpect.py>


file:///home/grid/sgmt2k005/ home_cream_174147165/
CREAM174147165/pexpect.py): error: globus_ftp_client: the server
responded with an error500 500-Command failed. :
globus_l_gfs_file_open failed.500-globus_xio: Unable to open file
/var/SandboxDir/31/https_3a_ 2f_2flcglb04.gridpp.rl.ac.uk_
3a9000_2f31MHBsdtFOj7AFEY7rs- lg/input/pexpect.py500-globus_ xio:
System error in open: No such file or directory500-globus_xio: A
system call failed: No such file or directory500 End.; reason=1;
open /home/grid/sgmt2k005/home_ cream_174147165/.ssh/id_rsa failed:
No such file or directory. /usr/shared_apps/admin/etc/
profile.d/keygen2: line 17: /home/grid/sgmt2k005/home_
cream_174147165/.ssh/ authorized_keys: No such file or directory
chmod: cannot access `/home/grid/sgmt2k005/home_
cream_174147165/.ssh/ authorized_keys': No such file or directory
/opt/glite/glite/bin/glite-lb- logevent: edg_wll_LogEvent*(): LB
server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent():
LB server (bkserver,lbproxy) store protocol error;; Logging library
ERROR: LB server (bkserver,lbproxy) store protocol error;;
edg_wll_DoLogEvent(): edg_wll_log_connect error DNS resolver error;;
edg_wll_gss_connect();; GSS Error: Unknown host)
/opt/glite/glite/bin/glite-lb- logevent: edg_wll_LogEvent*(): LB
server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent():
LB server (bkserver,lbproxy) store protocol error;; Logging library
ERROR: LB server (bkserver,lbproxy) store protocol error;;
edg_wll_DoLogEvent(): edg_wll_log_connect error DNS resolver error;;
edg_wll_gss_connect();; GSS Error: Unknown host) Cannot move ISB
(retry_copy ${globus_transfer_cmd} gsiftp://lcgwms02.gridpp.rl.
ac.uk:2811/var/SandboxDir/31/ https_3a_2f_2flcglb04.gridpp.
rl.ac.uk_3a9000_ 2f31MHBsdtFOj7AFEY7rs-lg/ input/pexpect.py
<http://lcgwms02.gridpp.rl.ac.uk:2811/var/SandboxDir/31/https_3a_2f_2flcglb04.gridpp.rl.ac.uk_3a9000_2f31MHBsdtFOj7AFEY7rs-lg/input/pexpect.py>


file:///home/grid/sgmt2k005/ home_cream_174147165/
CREAM174147165/pexpect.py): error: globus_ftp_client: the server
responded with an error 500 500-Command failed. :
globus_l_gfs_file_open failed. 500-globus_xio: Unable to open file
/var/SandboxDir/31/https_3a_ 2f_2flcglb04.gridpp.rl.ac.uk_
3a9000_2f31MHBsdtFOj7AFEY7rs- lg/input/pexpect.py 500-globus_xio:
System error in open: No such file or directory 500-globus_xio: A
system call failed: No such file or directory 500 End.
- Transfer to CREAM failed due to exception: CREAM Register raised
std::exception N5glite2ce16cream_client_ api16cream_
exceptions30JobSubmissionDisab ledExceptionE
Status Reason: hit job shallow retry count (10)
Destination: abaddon.hec.lancs.ac.uk:8443/ cream-lsf-hex
<http://abaddon.hec.lancs.ac.uk:8443/cream-lsf-hex>
Submitted: Wed Dec 5 15:36:25 2012 GMT
============================== ==============================
==============




--
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask] <mailto:daniela.bauer@imperial.ac.uk>
HEP Group/Physics Dep
Imperial College
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/



--
Sent from the pit of despair

-----------------------------------------------------------
[log in to unmask]
HEP Group/Physics Dep
Imperial College
Tel: +44-(0)20-75947810
http://www.hep.ph.ic.ac.uk/~dbauer/