Citando Maarten Litmaath <[log in to unmask]>:
> Ciao Alessandro,
>
>> since our CE has been migrated to sl4 and gLite3.1 (WNs already migrated
>> to gLite 3.1 since some months) there is a job submission problem: it
>> seems that every job submitted never arrives on the CE (the SAM tests
>> are failing for "proxy expired"), the command
>> $ globus-job-run gridit-ce-001.cnaf.infn.it/jobmanager-lcgpbs -queue
>> cert /bin/pwd
>>
>> returns nothing, but ssh from WNs to CE and qsub are working
>>
>> Do you have any hints? I'm also attaching the file gram_job_mgr_24157
>> related to my last "globus-job-run" attempt
>
> In other gram_job_mgr_*.log files there are complaints like this:
>
> -------------------------------------------------------------------------
> Thu May 1 04:31:25 2008 JM_SCRIPT: executable staging failed with Cannot
> connect socket to /opt/globus/var/globus-gass-cache-marshal.sock:
> No such file or directory
> -------------------------------------------------------------------------
>
> Indeed:
>
> -------------------------------------------------------------------------
> $ globus-job-run gridit-ce-001.cnaf.infn.it /bin/ps auxwww |
> grep 'marshal: accepting'
> root 3369 0.0 0.0 12920 8240 ? Ss Apr29 0:06
> globus-job-manager-marshal: accepting connections
> -------------------------------------------------------------------------
>
> Compare:
>
> -------------------------------------------------------------------------
> $ globus-job-run ce101.cern.ch /bin/ps auxwww | grep 'marshal: accepting'
> root 13249 0.4 0.5 80500 12204 ? Ss Apr29 11:33
> globus-job-manager-marshal: accepting connections
> root 13251 0.0 0.2 66920 4360 ? Ss Apr29 1:08
> globus-gass-cache-marshal: accepting connections
> -------------------------------------------------------------------------
>
> Try this:
>
> /etc/init.d/globus-gass-cache-marshal restart
>
Ciao Maarten, thank you very much!
You've just hit the spot!
indeed on CE:
------------------------------------------------
[root@gridit-ce-001 ~]# /etc/init.d/globus-gass-cache-marshal restart
Stopping globus-gass-cache-marshal: [FAILED]
Starting globus-gass-cache-marshal: [ OK ]
[root@gridit-ce-001 ~]#
[root@gridit-ce-001 ~]#
[root@gridit-ce-001 ~]# ps auxwww | grep 'marshal: accepting'
root 3369 0.0 0.0 12920 8240 ? Ss Apr29 0:14
globus-job-manager-marshal: accepting connections
root 20236 0.0 0.0 8108 3012 ? Ss 15:21 0:00
globus-gass-cache-marshal: accepting connections
root 20296 0.0 0.0 5524 664 pts/0 S+ 15:21 0:00 grep
marshal: accepting
---------------------------------------------------
and eventually from the UI:
----------------------------------------------------
[paolini@lcg-ui paolini]$ globus-job-run
gridit-ce-001.cnaf.infn.it/jobmanager-lcgpbs -queue cert /bin/hostname
gridit-wn-004.cnaf.infn.it
----------------------------------------------------
let me add that for some reason just after the first CE configuration
the "globus-job-manager-marshal" service was dead so neither the
authorization worked, and I had to remove its lock file and restart
the service....instead the "globus-gass-cache-marshal" service was not
started at all so I wasn't noticed of this second service
Best wishes,
Alessandro
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
|