Hi,
we also faced the same problem. Nice to see we are not alone in the universe.
But what did you do to solve the problem? We stopped the cron job, but this cannot be the solution.
Regards
Klaere
Jeff Templon schrieb:
> Hi *,
>
> We had a problem here with the new package cleanup-jobdirs on the
> CE. It probably doesn't affect most of you, or perhaps we are the only
> ones who were affected by it. But we found it by chance, it may be that
> for some sites, they have the problem but have not yet found it by
> chance. So, read on.
>
> What does cleanup-jobdirs do : this is explained well in the release
> notes. It "looked harmless enough" :-) Basically it looks in your
> gridmapdir for all your pool accounts, then one by one it cd's into the
> home of the pool account, and cleans up one specific subdir where a lot
> of temp files could accumulate. This is done via a cron job, every six
> hours. It is useful since there are so many temp files here, that it
> sometimes causes resource exhaustion on the CE.
>
> However, at our site, our pool account homes are automounted. Every
> time somebody does a "cd ~atlb021' for example, a new NFS mount is
> created (unless one already existed for that particular account's home
> directory). the cleanup script cleans all our pool homes in a short
> amount of time ... there are 2300 of these pool accounts ... so on each
> CE (we have three of them), at exactly the same time, there are 2300
> separate NFS mounts attempted, in a short period of time. This exhausts
> the number of allowed mounts, and what happens is that other mounts
> start failing during the time that the script is run. We did not
> anticipate this consequence of the cleanup-jobdirs script.
>
> We found this here, because I had some private cron jobs that failed
> every six hours .. the symptom is that they could not find files located
> in my (automounted) home directory on the CE machine.
>
> You can check if you have the problem : look in /var/log/messages on
> your CE machines, and between 06:47 and 06:50 do you see messages like:
>
> Oct 18 06:47:12 gazon kernel: RPC: Can't bind to reserved port (98).
> Oct 18 06:47:12 gazon kernel: RPC: can't bind to reserved port.
> Oct 18 06:47:12 gazon kernel: RPC: error 5 connecting to server
> schuur.nikhef.nl
> Oct 18 06:47:12 gazon kernel: RPC: Can't bind to reserved port (98).
> Oct 18 06:47:12 gazon kernel: RPC: can't bind to reserved port.
> Oct 18 06:47:12 gazon kernel: RPC: error 5 connecting to server
> schuur.nikhef.nl
> Oct 18 06:47:12 gazon automount[17574]: >> mount:
> schuur.nikhef.nl:/project/share/pool/atlas/atlas156: can't read superblock
> Oct 18 06:47:12 gazon automount[17574]: mount(nfs): nfs: mount failure
> schuur.nikhef.nl:/project/share/pool/atlas/atlas156 on /home/atlas156
> Oct 18 06:47:12 gazon automount[17574]: failed to mount /home/atlas156
>
> I am assuming here (by saying 06:47) that the cronjob runtime is
> hard-coded and not randomly generated by YAIM ... you can find when
> yours is set to run by looking in /etc/cron.d on the CE.
>
> You probably will not have this problem unless your setup is like ours,
> that each pool account home is a separate mount via eg an automount map.
>
> Hope this helps somebody!
>
> J "/home/templon/.signature : file not found" T
--
Klaere Cassirer
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Department of Simulation Engineering
Schloss Birlinghoven
D-53754 Sankt Augustin
Tel: +49 - 2241 - 14 - 2758
Fax: +49 - 2241 - 14 - 42758
E-mail: [log in to unmask]
Internet: http://www.scai.fraunhofer.de
|