Dear All,
I am trying to get the site back to being able to run atlas production
jobs following the SL6 switchover. There are a few issues to sort out
but I think I have them in hand. I was looking over the nodes last night
and noticed that one of them was quite sluggish. I ran top which took
about a minute to come up. When it was up it showed that there were 1900
processes running which Is way above normal.
This is a snippet of the top output...
25255 prdatl07 20 0 4124 536 432 S 88.0 0.0 0:10.75 cut
25254 prdatl07 20 0 332 176 104 R 87.2 0.0 0:10.65 tail
25256 prdatl07 20 0 380 176 104 R 87.0 0.0 0:10.63 sed
24787 root 20 0 16372 2680 944 R 81.6 0.0 1:02.78 top
25246 prdatl07 20 0 4136 556 444 S 75.3 0.0 0:09.20 tail
25288 prdatl07 20 0 105m 956 784 R 70.3 0.0 0:08.59 ps
25290 prdatl07 20 0 105m 956 784 R 70.3 0.0 0:08.59 ps
25268 prdatl07 20 0 388 172 104 R 68.1 0.0 0:08.32 grep
25286 prdatl07 20 0 388 176 104 R 67.1 0.0 0:08.19 egrep
25240 prdatl07 20 0 332 176 104 R 66.9 0.0 0:08.17 tail
25270 prdatl07 20 0 316 168 100 D 66.0 0.0 0:08.06 cut
25269 prdatl07 20 0 328 172 104 R 65.6 0.0 0:08.01 tail
25271 prdatl07 20 0 376 172 104 R 64.8 0.0 0:07.92 sed
25285 prdatl07 20 0 680 180 104 R 63.5 0.0 0:07.75 awk
25272 prdatl07 20 0 388 172 104 R 59.4 0.0 0:07.25 ls
25249 prdatl07 20 0 332 176 104 R 57.6 0.0 0:07.04 tr
25248 prdatl07 20 0 384 176 104 D 54.3 0.0 0:06.63 sed
25197 prdatl07 20 0 396 176 104 D 52.9 0.0 0:07.74 grep
25283 prdatl07 20 0 332 176 104 R 50.0 0.0 0:06.11 tail
25200 prdatl07 20 0 8432 720 584 S 49.6 0.0 0:09.21 sed
25242 prdatl07 20 0 380 176 104 D 48.4 0.0 0:05.91 sed
25368 prdatl07 20 0 332 176 104 R 46.8 0.0 0:05.72 tail
25276 prdatl07 20 0 324 172 100 D 46.3 0.0 0:05.66 cut
It seems as if a job was spawning off loads of sed / grep / ps / tail /
tr etc. processes. This isn't normal behaviour is it ? DO you thijnk
this may be a rogue job ?
Regards,
Emyr
|