Anja Vest wrote:
> Hi Maarten,
> last week our job submission problem was gone after we set in /etc/hosts :
>
> -----------------------------------------------------------------------------
>
> 192.168.101.231 ekp-lcg-ce.physik.uni-karlsruhe.de
> ekp-lcg-ce.ekpplus.cluster ekp-lcg-ce
> -----------------------------------------------------------------------------
>
>
> as you proposed.
> According to your reply to GGUS ticket#2492, the WN's now resolve the SE
> ekp-lcg-se.physik.uni-karlsruhe.de
> to its public address 129.13.133.13
> The entries in /etc/hosts now look like this:
>
> # LCG stuff
> 192.168.101.220 ekp-lcg-ui.physik.uni-karlsruhe.de ekp-lcg-ui
> ekp-lcg-ui.ekpplus.cluster
> 192.168.101.231 ekp-lcg-ce.physik.uni-karlsruhe.de ekp-lcg-ce
> ekp-lcg-ce.ekpplus.cluster
> 192.168.101.232 ekp-lcg-se.ekpplus.cluster
> 129.13.133.13 ekp-lcg-se.physik.uni-karlsruhe.de ekp-lcg-se
> 129.13.133.14 ekp-lcg-mon.physik.uni-karlsruhe.de ekp-lcg-mon
>
> This has been working since sunday morning (or saturday afternoon).
> From then on a grid job shows the same symptoms as before.
> Trying to do the test in
> http://goc.grid.sinica.edu.tw/gocwiki/submit-helper_script_%2e%2e%2e_gave_error%3a_cache_export_dir_%2e%2e%2e
>
> and following all the diagnosis steps, I just got:
>
> ekpplus021:~>globus-url-copy file:/etc/group
> gsiftp://ekp-lcg-ce.physik.uni-karlsruhe.de/tmp/test.$$
> error: globus_l_ftp_control_send_cmd_cb: gss_init_sec_context failed
>
> GSS failure:
> GSS Major Status: Authentication Failed
> GSS Minor Status Error Chain:
> ...
Please rerun that command with the "-dbg" option: maybe this time the problem
is due to something else. Did you change more things on the WN? For example,
what does this command report:
grep hosts /etc/nsswitch.conf
> Did you do the test via a grid job ? (I think I have to do so as well
> since I have no root access on our WN's)
I submitted a job to the jobmanager-fork on your CE, that submitted a test job
to your batch system with qsub.
> Could it be possible, that also the CE needs to be resolved to its
> public address on the WN's?
If the original fix worked last week, we should be able to get it to work again
without resorting to using the public address, which was in fact proposed as an
alternative solution.
> cheers,
> Anja
>
> [log in to unmask] wrote:
>
>> On Fri, 6 May 2005, Anja Vest wrote:
>>
>>
>>
>>> [...]
>>> submit-helper script running on host ekp-lcg-ce gave error:
>>> cache_export_dir
>>> (/usr/users/dcms002/.lcgjm/globus-cache-export.XgmvIH) on gatekeeper
>>> did not contain a cache_export_dir.tar archive
>>>
>>
>>
>> This is answered in the job submission section of the Wiki FAQ:
>>
>> http://goc.grid.sinica.edu.tw/gocwiki/SiteProblemsFollowUpFaq
>>
>> In particular:
>>
>> http://goc.grid.sinica.edu.tw/gocwiki/submit-helper_script_%2e%2e%2e_gave_error%3a_cache_export_dir_%2e%2e%2e
>>
>>
>> Indeed, a test shows your WNs cannot globus-url-copy to/from your CE:
>>
>> -----------------------------------------------------------------------------
>>
>> error: globus_l_ftp_control_send_cmd_cb: gss_init_sec_context failed
>>
>> GSS failure:
>> GSS Major Status: Unexpected Gatekeeper or Service Name
>> GSS Minor Status Error Chain:
>>
>> init_sec_context.c:251: gss_init_sec_context: Mutual authentication
>> failed:
>> The target name
>> (/O=GermanGrid/OU=EKP/CN=host/ekp-lcg-ce.physik.uni-karlsruhe.de)
>> in the context, and the target name (/CN=host/ekp-lcg-ce.ekpplus.cluster)
>> passed to the function do not match
>> -----------------------------------------------------------------------------
>>
>>
>> The problem is due to this entry in /etc/hosts on your WN (e.g.
>> ekpplus020):
>>
>> -----------------------------------------------------------------------------
>>
>> 192.168.101.231 ekp-lcg-ce.ekpplus.cluster ekp-lcg-ce
>> -----------------------------------------------------------------------------
>>
>>
>> You could change that line as follows (all on a single line):
>>
>> -----------------------------------------------------------------------------
>>
>> 192.168.101.231 ekp-lcg-ce.physik.uni-karlsruhe.de
>> ekp-lcg-ce.ekpplus.cluster ekp-lcg-ce
>> -----------------------------------------------------------------------------
>>
>>
>> That is, fake that the private interface also has the external name.
>>
>> A simpler/cleaner solution is not to use the private interface at all:
>>
>> -----------------------------------------------------------------------------
>>
>> 129.13.133.12 ekp-lcg-ce.physik.uni-karlsruhe.de ekp-lcg-ce
>> -----------------------------------------------------------------------------
>>
>>
>>
>>
|