Hello again,
> I'll get back to Jon P to see if he sees any changes, along with Ewan's
> suggestion of trying direct job submission.
Jon did a lot of investigating last night into the t2k problems at
Lancaster, and found that direct job submission to our CE worked, as
well as WMS submission from some WMSes. Other failed with the same
failure mode (https://ggus.eu/ws/ticket_info.php?ticket=88628).
The successful WMSi were:
lcgwms02 & lcgwms03.gridpp.ac.uk at RAL
The unsuccessful WMS jobs went through:
wms01.grid.hep.ph.ic.ac.uk & wms02.grid.hep.ph.ic.ac.uk (and lcgwms04
at RAL).
The failed jobs all seemed to abort with the same long, authenticationy
looking error message detailed previously: "error: globus_ftp_client:
the server responded with an error500 500 .... Unable to open file
...Cannot move ISB".
I don't know if there's anything significant that these WMS have in
common? It could be that the gubbins that control the interaction
between abaddon.hec.lancs.ac.uk and these WMS has got into a bad state.
Any ideas appreciated!
Thanks,
Matt
> Cheers,
> Matt
> P.S. Chris will be pleased that I had a bash at fixing ngs.ac.uk access
> on our CE :-)
>
> On 12/19/2012 11:07 AM, Christopher J. Walker wrote:
>> On 19/12/12 10:56, Daniela Bauer wrote:
>>> Hi Matt,
>>>
>>> check your CE:
>>>
>>> lx05:~ :~] voms-proxy-init -valid 24:00 --voms t2k.org <http://t2k.org>
>>> Enter GRID pass phrase:
>>> Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=daniela bauer
>>> Creating temporary proxy
>>> .................................................................. Done
>>> Contacting voms.gridpp.ac.uk:15003 <http://voms.gridpp.ac.uk:15003>
>>> [/C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk
>>> <http://voms.gridpp.ac.uk>] "t2k.org <http://t2k.org>" Done
>>> Creating proxy
>>> ........................................................................................................................
>>>
>>>
>>> Done
>>> Your proxy is valid until Thu Dec 20 10:51:18 2012
>>>
>>> lx05:~ :~] uberftp abaddon.hec.lancs.ac.uk
>>> <http://abaddon.hec.lancs.ac.uk>
>>> 220 abaddon.hec.lancs.ac.uk <http://abaddon.hec.lancs.ac.uk> GridFTP
>>> Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.
>>> 530-Login incorrect. : globus_gss_assist: Error invoking callout
>>> 530-globus_callout_module: The callout returned an error
>>> 530-an unknown error occurred
>>> 530 End.
>>>
>>> But:
>>> lx05:~ :~] voms-proxy-init -valid 24:00 --voms dteam
>>> Enter GRID pass phrase:
>>> Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=daniela bauer
>>> Creating temporary proxy ....................................... Done
>>> Contacting voms2.hellasgrid.gr:15004 <http://voms2.hellasgrid.gr:15004>
>>> [/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
>>> <http://hellasgrid.gr/CN=voms2.hellasgrid.gr>] "dteam" Failed
>>>
>>> Error: Error during SSL handshake:
>>>
>>> Trying next server for dteam.
>>> Creating temporary proxy ....................... Done
>>> Contacting voms.hellasgrid.gr:15004 <http://voms.hellasgrid.gr:15004>
>>> [/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
>>> <http://hellasgrid.gr/CN=voms.hellasgrid.gr>] "dteam" Done
>>> Creating proxy .......................... Done
>>> Your proxy is valid until Thu Dec 20 10:52:02 2012
>>>
>>>
>>> lx05:~ :~] uberftp abaddon.hec.lancs.ac.uk
>>> <http://abaddon.hec.lancs.ac.uk>
>>> 220 abaddon.hec.lancs.ac.uk <http://abaddon.hec.lancs.ac.uk> GridFTP
>>> Server 6.10 (gcc64, 1334324800-83) [Globus Toolkit 5.2.0] ready.
>>> 230 User dteam167 logged in.
>>>
>>> The content of vomsdir looks fine to me, but obviously I can't see any
>>> of your more subtle configuration issues (are you using Argus?)
>>>
>>
>> Trying as ngs.ac.uk, that doesn't work either (and you might as well fix
>> it while you are fiddling)
>>
>> walker@heppc300:~/grid/ngs$ uberftp abaddon.hec.lancs.ac.uk
>> 220 abaddon.hec.lancs.ac.uk GridFTP Server 6.10 (gcc64, 1334324800-83)
>> [Globus Toolkit 5.2.0] ready.
>> 530-Login incorrect. : globus_gss_assist: Error invoking callout
>> 530-globus_callout_module: The callout returned an error
>> 530-an unknown error occurred
>> 530 End.
>> walker@heppc300:~/grid/ngs$ voms-proxy-info --all
>> subject : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
>> walker/CN=proxy
>> issuer : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
>> walker
>> identity : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
>> walker
>> type : proxy
>> strength : 1024 bits
>> path : /tmp/x509up_u32184
>> timeleft : 11:31:39
>> === VO ngs.ac.uk extension information ===
>> VO : ngs.ac.uk
>> subject : /C=UK/O=eScience/OU=QueenMaryLondon/L=Physics/CN=christopher
>> walker
>> issuer : /C=UK/O=eScience/OU=Manchester/L=HEP/CN=voms.gridpp.ac.uk
>> attribute : /ngs.ac.uk/Role=NULL/Capability=NULL
>> timeleft : 11:31:39
>> uri : voms.gridpp.ac.uk:15010
>>
>>
>> Looking at your that machine:
>>
>> walker@heppc300:~/grid/ngs$ uberftp abaddon.hec.lancs.ac.uk
>> 220 abaddon.hec.lancs.ac.uk GridFTP Server 6.10 (gcc64, 1334324800-83)
>> [Globus Toolkit 5.2.0] ready.
>> 230 User dteam116 logged in.
>> uberftp> cd /etc/grid-security/vomsdir/ngs.ac.uk
>> uberftp> ls
>> -rw-r--r-- 1 root root 64 Sep 13 13:00 15010.lsc
>> -rw-r--r-- 1 root root 148 Sep 13 13:25 voms.ngs.ac.uk.lsc
>> uberftp> cat 15010.lsc
>> ngs.ac.uk
>> /C=UK/O=eScienceCA/OU=Authority/CN=UK e-Science CA 2B
>>
>>
>> I suspect you are missing a field in your site-info.def (or have an
>> extra one) for the ngs VO.
>>
>> No, this doesn't help with t2k.org I'm afraid.
>>
>> Chris
>>
>>> Cheers,
>>> Daniela
>>>
>>>
>>> On 19 December 2012 10:47, Matt Doidge <[log in to unmask]
>>> <mailto:[log in to unmask]>> wrote:
>>>
>>> Hello all, I hope I caught some of you before you headed off for the
>>> holidays!
>>>
>>> Lancaster has been trying to get T2K working on our clusters, and on
>>> our occasionally quirky shared cluster T2K are consistently failing
>>> to successfully submit jobs via the WMS (well technically submission
>>> works, the jobs get aborted), with an incredibly verbose error
>>> message (replicated below, you can also see it in the ticket
>>> https://ggus.eu/ws/ticket_ info.php?ticket=88628
>>> <https://ggus.eu/ws/ticket_info.php?ticket=88628>).
>>>
>>> The error message looks like either an authentication, permissions
>>> or missing destination problem - but I've checked our CE and
>>> everything seems okay. As a test I asked Jon to uberftp into our CE,
>>> and he did so without problem as an sgmt2k user.
>>>
>>> I'm a little stuck, and would appreciate someone who speaks
>>> glite-wms-job-status error message to take a look and maybe pinpoint
>>> where in the chain things are breaking. I've learnt the hard way
>>> that the CREAM/WMS interaction is quite complex, and I'm wondering
>>> if this is one of the cases where this has screwed up (the two have
>>> become "out of sync" somehow).
>>>
>>> Thanks in advance, and Merry Christmas!
>>> Matt
>>>
>>> ======================= glite-wms-job-status Success
>>> =====================
>>> BOOKKEEPING INFORMATION:
>>>
>>> Status info for the Job : https://lcglb04.gridpp.rl.ac.
>>> uk:9000/31MHBsdtFOj7AFEY7rs-lg
>>> <https://lcglb04.gridpp.rl.ac.uk:9000/31MHBsdtFOj7AFEY7rs-lg>
>>> Current Status: Aborted
>>> Logged Reason(s):
>>> - Cannot move ISB (retry_copy ${globus_transfer_cmd}
>>> gsiftp://lcgwms02.gridpp.rl. ac.uk:2811/var/SandboxDir/31/
>>> https_3a_2f_2flcglb04.gridpp. rl.ac.uk_3a9000_
>>> 2f31MHBsdtFOj7AFEY7rs-lg/ input/pexpect.py
>>> <http://lcgwms02.gridpp.rl.ac.uk:2811/var/SandboxDir/31/https_3a_2f_2flcglb04.gridpp.rl.ac.uk_3a9000_2f31MHBsdtFOj7AFEY7rs-lg/input/pexpect.py>
>>>
>>>
>>> file:///home/grid/sgmt2k005/ home_cream_174147165/
>>> CREAM174147165/pexpect.py): error: globus_ftp_client: the server
>>> responded with an error500 500-Command failed. :
>>> globus_l_gfs_file_open failed.500-globus_xio: Unable to open file
>>> /var/SandboxDir/31/https_3a_ 2f_2flcglb04.gridpp.rl.ac.uk_
>>> 3a9000_2f31MHBsdtFOj7AFEY7rs- lg/input/pexpect.py500-globus_ xio:
>>> System error in open: No such file or directory500-globus_xio: A
>>> system call failed: No such file or directory500 End.; reason=1;
>>> open /home/grid/sgmt2k005/home_ cream_174147165/.ssh/id_rsa failed:
>>> No such file or directory. /usr/shared_apps/admin/etc/
>>> profile.d/keygen2: line 17: /home/grid/sgmt2k005/home_
>>> cream_174147165/.ssh/ authorized_keys: No such file or directory
>>> chmod: cannot access `/home/grid/sgmt2k005/home_
>>> cream_174147165/.ssh/ authorized_keys': No such file or directory
>>> /opt/glite/glite/bin/glite-lb- logevent: edg_wll_LogEvent*(): LB
>>> server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent():
>>> LB server (bkserver,lbproxy) store protocol error;; Logging library
>>> ERROR: LB server (bkserver,lbproxy) store protocol error;;
>>> edg_wll_DoLogEvent(): edg_wll_log_connect error DNS resolver error;;
>>> edg_wll_gss_connect();; GSS Error: Unknown host)
>>> /opt/glite/glite/bin/glite-lb- logevent: edg_wll_LogEvent*(): LB
>>> server (bkserver,lbproxy) store protocol error (edg_wll_LogEvent():
>>> LB server (bkserver,lbproxy) store protocol error;; Logging library
>>> ERROR: LB server (bkserver,lbproxy) store protocol error;;
>>> edg_wll_DoLogEvent(): edg_wll_log_connect error DNS resolver error;;
>>> edg_wll_gss_connect();; GSS Error: Unknown host) Cannot move ISB
>>> (retry_copy ${globus_transfer_cmd} gsiftp://lcgwms02.gridpp.rl.
>>> ac.uk:2811/var/SandboxDir/31/ https_3a_2f_2flcglb04.gridpp.
>>> rl.ac.uk_3a9000_ 2f31MHBsdtFOj7AFEY7rs-lg/ input/pexpect.py
>>> <http://lcgwms02.gridpp.rl.ac.uk:2811/var/SandboxDir/31/https_3a_2f_2flcglb04.gridpp.rl.ac.uk_3a9000_2f31MHBsdtFOj7AFEY7rs-lg/input/pexpect.py>
>>>
>>>
>>> file:///home/grid/sgmt2k005/ home_cream_174147165/
>>> CREAM174147165/pexpect.py): error: globus_ftp_client: the server
>>> responded with an error 500 500-Command failed. :
>>> globus_l_gfs_file_open failed. 500-globus_xio: Unable to open file
>>> /var/SandboxDir/31/https_3a_ 2f_2flcglb04.gridpp.rl.ac.uk_
>>> 3a9000_2f31MHBsdtFOj7AFEY7rs- lg/input/pexpect.py 500-globus_xio:
>>> System error in open: No such file or directory 500-globus_xio: A
>>> system call failed: No such file or directory 500 End.
>>> - Transfer to CREAM failed due to exception: CREAM Register raised
>>> std::exception N5glite2ce16cream_client_ api16cream_
>>> exceptions30JobSubmissionDisab ledExceptionE
>>> Status Reason: hit job shallow retry count (10)
>>> Destination: abaddon.hec.lancs.ac.uk:8443/ cream-lsf-hex
>>> <http://abaddon.hec.lancs.ac.uk:8443/cream-lsf-hex>
>>> Submitted: Wed Dec 5 15:36:25 2012 GMT
>>> ============================== ==============================
>>> ==============
>>>
>>>
>>>
>>>
>>> --
>>> Sent from the pit of despair
>>>
>>> -----------------------------------------------------------
>>> [log in to unmask] <mailto:[log in to unmask]>
>>> HEP Group/Physics Dep
>>> Imperial College
>>> Tel: +44-(0)20-75947810
>>> http://www.hep.ph.ic.ac.uk/~dbauer/
|