Hi Robert,
Thanks. OUgh.....
Feb 6 14:50:17 wn149 kernel: attr: page allocation failure. order:4, mode:0xd0
Feb 6 14:50:17 wn149 kernel: Pid: 17353, comm: attr Not tainted 2.6.32-358.23.2.el6.x86_64 #1
Feb 6 14:50:17 wn149 kernel: Call Trace:
Feb 6 14:50:17 wn149 kernel: [<ffffffff8112c287>] ? __alloc_pages_nodemask+0x757/0x8d0
Indeed, memory is getting low. But swap isn't.
I'm not sure this is so harmless though :
Feb 6 15:18:16 wn149 cvmfs2: (.modulerc) switching proxy from http://node25.datagrid.cea.fr:3128 to http://node14.datagrid.cea.fr:3128
Feb 6 15:18:16 wn149 cvmfs2: (.modulerc) switching proxy from http://node14.datagrid.cea.fr:3128 to http://squid-atlas.grif.fr:3128
Feb 6 15:18:16 wn149 cvmfs2: (.modulerc) failed to download repository manifest (9)
Feb 6 15:18:16 wn149 cvmfs2: (.modulerc) Failed to initialize root file catalog (16)
Feb 6 15:19:56 wn149 cvmfs2: (.modulerc) switching proxy from http://node25.datagrid.cea.fr:3128 to http://node14.datagrid.cea.fr:3128
Feb 6 15:19:56 wn149 cvmfs2: (.modulerc) switching proxy from http://node14.datagrid.cea.fr:3128 to http://squid-atlas.grif.fr:3128
Feb 6 15:19:56 wn149 cvmfs2: (.modulerc) failed to download repository manifest (9)
Feb 6 15:19:56 wn149 cvmfs2: (.modulerc) Failed to initialize root file catalog (16)
Feb 6 15:22:15 wn149 cvmfs2: (atlas.cern.ch) switched to catalog revision 4388
Feb 6 15:35:16 wn149 kernel: attr: page allocation failure. order:4, mode:0xd0
This is atlas jobs eating all the memory :'(
Frederic
-----Message d'origine-----
De : LHC Computer Grid - Rollout [mailto:[log in to unmask]] De la part de Robert Frank
Envoyé : jeudi 6 février 2014 16:14
À : [log in to unmask]
Objet : Re: [LCG-ROLLOUT] cvmfs issues for atlas ?
Hi Frederic,
it's a problem with the cvmfs nagios probe. Those errors show up on our machines
occasionally, but always on the ones that have a high memory usage. What I found
was that the nagios probe uses /usr/bin/attr to get information about the state
of the cvmfs filesystem, e.g. "/usr/bin/attr -q -g version /cvmfs/atlas.cern.ch"
to get the cvmfs version. If most of the memory is used, it's possible that attr
can't allocate enough kernel memory space to operate and fails with the error
message you've posted. I guess you can ignore the error of the cvmfs probe, it
should go away with the next check. If it doesn't you should check the memory
usage of the jobs running on that node.
Cheers,
Robert
On 06/02/14 13:54, SCHAER Frederic wrote:
> Hi
>
> Are there known issues with the atlas cvmfs repositories ?
> Since this morning, we get this kind of monitoring errors, which come and go away on random hosts :
>
> atlas.cern.ch ... attr_get: Cannot allocate memory
> Could not get "version" for .
> SERVICE STATUS: failed to read version attribute - test took 1 s
>
> We see no error for other repos... ?
>
> Our version :
> # rpm -qa|grep cvmfs
> cvmfs-init-scripts-1.0.20-1.noarch
> cvmfs-keys-1.4-1.noarch
> cvmfs-2.1.15-1.el6.x86_64
>
> Thanks
>
|