Hi All,
I'm having a strange problem with my LCG-CE. I believe that it is caused
by a gridftp problem. Could you take a look...
When I run:
$ globus-job-run axon-g01.ieeta.pt:2119/jobmanager-lcgpbs -q dteam
/bin/hostname
The job reaches the Running stage without problems, but it gets stucked
in that stage.
$ tail /var/log/messages
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: Got connection
193.136.171.210 at Mon May 4 15:34:29 2009
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: Authenticated globus
user: /C=PT/O=LIPCA/O=IEETA/CN=Luis Filipe Sequeira Alves
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: Requested service:
jobmanager-lcgpbs
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: Authorized as local
user: dteam044
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: Authorized as local
uid: 18759
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: and local
gid: 2688
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]:
"/C=PT/O=LIPCA/O=IEETA/CN=Luis Filipe Sequeira Alves" mapped to dteam044
(18759/2688)
May 4 15:34:29 axon-g01 GRAM gatekeeper[12482]: JMA 2009/05/04 15:34:29
GATEKEEPER_JM_ID 2009-05-04.15:34:29.0000012482.0000000000 has
EDG_WL_JOBID ''
May 4 15:34:29 axon-g01 gridinfo[12484]: JMA 2009/05/04 15:34:29
GATEKEEPER_JM_ID 2009-05-04.15:34:29.0000012482.0000000000 for
/C=PT/O=LIPCA/O=IEETA/CN=Luis Filipe Sequeira Alves on 193.136.171.210
May 4 15:34:29 axon-g01 gridinfo[12484]: JMA 2009/05/04 15:34:29
GATEKEEPER_JM_ID 2009-05-04.15:34:29.0000012482.0000000000 mapped to
dteam044 (18759, 2688)May 4 15:34:29 axon-g01 gridinfo[12484]: JMA
2009/05/04 15:34:29 GATEKEEPER_JM_ID
2009-05-04.15:34:29.0000012482.0000000000 has GRAM_SCRIPT_JOB_ID
1241447669:lcgpbs:internal_1330801252:12484.1241447669 manager type lcgpbs
May 4 15:35:01 axon-g01 gridinfo: [1825-12587] Submitted job
1241447669:lcgpbs:internal_1330801252:12484.1241447669 to batch system
lcgpbs with ID 147.axon-g01.ieeta.pt
May 4 15:35:03 axon-g01 sshd[12645]: Accepted hostbased for dteam044
from 193.136.171.215 port 32864 ssh2
May 4 14:35:03 axon-g01 sshd[12646]: Accepted hostbased for dteam044
from 193.136.171.215 port 32864 ssh2
May 4 15:35:03 axon-g01 sshd(pam_unix)[12647]: session opened for user
dteam044 by (uid=0)
May 4 15:35:03 axon-g01 sshd[12647]: User dteam044 attempting to
execute command 'scp -r -p -f
/home/dteam044/.lcgjm/globus-cache-export.M12603/globus-cache-export.M12603.gpg'
on command line
May 4 15:35:03 axon-g01 sshd(pam_unix)[12647]: session closed for user
dteam044
$ tail /var/log/globus-gridftp.log
DATE=20090504123802.460569 HOST=axon-g01.ieeta.pt
PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20090504123702.554094
USER=ops042 FILE=/tmp BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1
STRIPES=1 DEST=[0.0.0.0] TYPE=LIST CODE=226
DATE=20090504133802.985333 HOST=axon-g01.ieeta.pt
PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20090504133703.79214
USER=ops042 FILE=/tmp BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1
STRIPES=1 DEST=[0.0.0.0] TYPE=LIST CODE=226
DATE=20090504140639.647177 HOST=axon-g01.ieeta.pt
PROG=globus-gridftp-server NL.EVNT=FTP_INFO START=20090504140508.626367
USER=dteam044
FILE=/home/dteam044/.lcgjm/globus-cache-export.Om5706/cache_export_dir.tar
BUFFER=0 BLOCK=262144 NBYTES=0 VOLUME=/ STREAMS=1 STRIPES=1
DEST=[0.0.0.0] TYPE=RETR CODE=226
# cat /var/log/maui.log | grep 15001
05/04 10:39:24 WARNING: cannot set job '138.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 10:51:37 WARNING: cannot set job '139.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 10:58:17 WARNING: cannot set job '140.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 11:25:03 WARNING: cannot set job '141.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 11:48:41 WARNING: cannot set job '142.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 12:13:36 WARNING: cannot set job '143.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 12:16:06 WARNING: cannot set job '144.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 13:51:09 WARNING: cannot set job '145.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 15:05:07 WARNING: cannot set job '146.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
05/04 15:35:03 WARNING: cannot set job '147.axon-g01.ieeta.pt' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
Can anybody give me a Hint?!
TIA.
Best Regards,
Luís
|