Frederic Schaer wrote:
> I called people this morning so that they would take a look at the
> server load, and find a workaround...
> lxn1188.cern.ch seems OK now with a Datagrid-fr certificate
> (job-list-match succeeds), but errors will still occur on all other
> machines (CE and so on) that could not download the files...
>
> There was a similar problem a few month ago with the French CA server,
> and this was because the Apache process was allowed to have (only ?
> what's the other CAs servers config ?) 500 child processes : it could
> not answer all http requests generated at the "CRL download time" (how
> many workers/machines are there on the grid ? 10 000 ? If they are all
> downloading the same file at the same time in the same place, I can
> understand the server fails if it's not "properly" configured (or may I
> say strong enough ?)...)
While it would be good to try and improve the web server in whichever way,
there clearly is a bug in the way the cron job is generated: it must use
a random minute in a random hour (modulo 6). I will open a bug...
> Regards,
> Frederic Schaer
>
> Maarten Litmaath a écrit :
>
>>
>> /var/log/edg-fetch-crl-cron.log contains many errors like these:
>>
>> -------------------------------------------------------------------------
>> edg-fetch-crl: [2005/01/05-10:24:42] could not download a valid file from
>> 'http://igc.services.cnrs.fr/cgi-bin/loadcrl?CA=CNRS-Projets&format=PEM'
>> Time limit exceeded.
>> [...]
>> edg-fetch-crl: [2005/01/05-10:31:29] could not download a valid file from
>> 'http://igc.services.cnrs.fr/cgi-bin/loadcrl?CA=CNRS&format=PEM'
>> -------------------------------------------------------------------------
>>
>> I ran the cron job manually around 12:00 and this time it worked.
>>
>> Could the admin of igc.services.cnrs.fr have a look at that machine
>> (load, syslog errors, memory, disk space, ...) and/or its network
>> connectivity?
>>
>> Emanouil, please give it another try.
>>
>>
>
>
|