Hi
there are also things that could happen, for example if some process
deleted the active GASS cache files for a running job ... this tends to
wake up job managers and make them very unhappy.
JT
Maarten Litmaath wrote:
> Hi Jason,
>
>> no, we have a long time not touching the configurations of w-ce01
>> which is also an old CE box (i plan to replace with slc4 lcgCE version
>> but yet have
>
> Might there be a hardware problem?
>
>> time to proceed further), the only error i can find from gatekeeper
>> log is 'Generic verification error for VOMS (failure)!' which shall be
>> ignore anyway and might be irrelevant to this issue as well.
>
> Indeed, but check that /etc/grid-security/vomsdir is up to date,
> e.g. with the latest lcg-vomscerts rpm and/or correct *.lsc contents:
>
> http://goc.grid.sinica.edu.tw/gocwiki/Generic_verification_error_for_VOMS_%28failure%29%21
>
>
>> the other error related to the invalid proxy, that should also have
>> limited impact to the stability of the CE box. though there are more
>> than 16k entries referring to same error:
>>
>> --
>> JMA 2008/06/01 08:41:57 GATEKEEPER_JM_ID
>> 2008-06-01.08:41:46.0000016353.0000086583 for
>> /DC=org/DC=doegrids/OU=People/CN=Nurcan Ozturk 18551 on 130.199.54.53
>> JMA 2008/06/01 08:41:57 GATEKEEPER_JM_ID
>> 2008-06-01.08:41:46.0000016353.0000086583 mapped to atlasprd (41000,
>> 1307)
>> JMA 2008/06/01 08:41:57 GATEKEEPER_JM_ID
>> 2008-06-01.08:41:46.0000016353.0000086583 has GRAM_SCRIPT_JOB_ID
>> 1212309717:lcgpbs:internal_45700877:23072.1212309712 manager type lcgpbs
>> JMA 2008/06/01 08:42:01 GATEKEEPER_JM_ID
>> 2008-06-01.08:41:46.0000016353.0000086583 JM exiting
>>
>> ERROR: Couldn't find a valid proxy.
>> Use -debug for further information.
>> --
>
> Did you configure your CE _only_ as CE or also as a UI or so?
|