Hi Maarten, *,
Maarten Litmaath wrote:
>> There was a similar problem a few month ago with the French CA server,
>> and this was because the Apache process was allowed to have (only ?
>> what's the other CAs servers config ?) 500 child processes : it could
>> not answer all http requests generated at the "CRL download time" (how
>> many workers/machines are there on the grid ? 10 000 ? If they are all
>> downloading the same file at the same time in the same place, I can
>> understand the server fails if it's not "properly" configured (or may I
>> say strong enough ?)...)
>
> While it would be good to try and improve the web server in whichever way,
> there clearly is a bug in the way the cron job is generated: it must use
> a random minute in a random hour (modulo 6). I will open a bug...
From the statistics on the CA web site here, it seels most sites
actually use LCFGng and the AUTO minutes specification.
In a single day, the DutchGrid CA gets between 12000 and 18000 requests
for the CRL, from between 3600 and 4000 unique hosts.
The rate is about 30-40/minute maximum.
That shows that at:
* there are not that many active hosts (at least not the ~9000)
* many worker nodes are using a web proxy or NAT (the first one
definitely helps in reducing the load on the CA servers)
* many nodes fail to retrieve the CRLs at all.
What actually happens I don't know... I hope it's all web proxies, of
course :-)
Cheers,
DavidG.
--
David Groep
** National Institute for Nuclear and High Energy Physics, PDP/Grid group **
** Room: H1.56 Phone: +31 20 5922179, PObox 41882, NL-1009DB Amsterdam NL **
|