Hi Marteen,
It was indeed my umask! I fell for the cfengine umask trap :(
I'm re-installing the CE.
Thanks and apologies,
Yves
...
runYaim.grid_prod_wn|runYaim.grid_pps_wn::
"/opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n glite-WN -n glite-TORQUE_client"
define=yaimFlag
umask=022
runYaim.grid_prod_ce::
"/opt/glite/yaim/bin/yaim -c -s /root/yaim-conf/site-info.def -n lcg-CE TORQUE_utils"
define=yaimFlag
!!!!! missing umask
...
On Sat, 3 May 2008, Maarten Litmaath wrote:
> Hi Yves,
>
>> I've had problems too with my SL4 LCG CE: the bdii keeps dying every hour
>> after the gatekeeper receives a kill signal (15) and restarts itself.
>> That will be subject of another message once I've collectes straces,...
>> Independently of this, you may wish to look at the read permissions on
>> several files and directories in /opt/globus. I did a fresh install of a
>> CE yesterday. You can find my tests and changes below.
>>
>> Yves
>>
>> The first problem was with:
>>
>> [root@epgce2 ~]# ll -d /opt/globus/etc/grid-services
>> drwx------ 2 root root 4096 May 2 16:37 /opt/globus/etc/grid-services
>>
>> which had to be world readable. But, there was more:
>>
>> $ globus-job-run epgce2.ph.bham.ac.uk /bin/hostname
>> GRAM Job submission failed because data transfer to the server failed
>> (error code 10)
>> $ globus-job-run epgce2.ph.bham.ac.uk /bin/hostname
>> GRAM Job submission failed because the job manager is misconfigured, a
>> scheduler script is missing (error code 105)
>>
>> were caused by the wrong permissions on:
>>
>> [root@epgce2 ~]# ll /opt/globus/lib/perl/Globus/GRAM/JobManager
>> total 44
>> -rw------- 1 root root 14671 May 2 19:07 fork.pm
>> -rw-r--r-- 1 root root 20100 May 2 19:07 lcgpbs.p
>>
>> [root@epgce2 ~]# ll -d /opt/globus/lib/perl/Globus/GRAM/JobManager
>> drwxr----- 2 root root 4096 May 2 19:07
>> /opt/globus/lib/perl/Globus/GRAM/JobManager
>>
>> which again had to be world readable.
>>
>> Then, it looked better:
>>
>> $ globus-job-run epgce2.ph.bham.ac.uk /bin/hostname
>> epgce2.ph.bham.ac.uk
>>
>> but the following test failed :(
>>
>> $ globus-job-run
>> epgce2.ph.bham.ac.uk:2119/jobmanager-lcgpbs /bin/hostname
>> GRAM Job submission failed because data transfer to the server failed
>> (error code 10)
>>
>> The read permission on lcgpbs.rvf was again too restrictive:
>>
>> [root@epgce2 ~]# ls -l /opt/globus/share/globus_gram_job_manager
>> total 28
>> -rw-r--r-- 1 root root 12938 Dec 8 2006 globus-gram-job-manager.rvf
>> -rw------- 1 root root 989 May 2 19:07 lcgpbs.rvf
>>
>> Finally, it worked.
>>
>> $ globus-job-run
>> epgce2.ph.bham.ac.uk:2119/jobmanager-lcgpbs /bin/hostname
>> s25.esc.bham.ac.uk
>
> None of those files and directories is owned by any rpm. They are created
> on the fly by rpms or by the Globus configuration procedures invoked by YAIM.
> If those entries end up with too restrictive permissions, my guess is that
> your umask was too restrictive, e.g. 077.
> Your root umask should be set to 022 for the configuration to work.
> You may want to open a bug about the matter (category Configuration):
> YAIM should explicitly set the umask it needs.
>
|