Dear All
I'm still supporting an lcg-CE dedicated for Auger VO
The machine presents a very bad performance, and as far as I've seen, it
is because there are a lot of processes like:
├─globus-job-mana,30974 -conf /opt/globus/etc/globus-job-manager.conf
-type lcgsge -rdn jobmanager-lcgsge -machine-type unknown -publish-jobs
│ └─globus-job-mana,11832 -m lcgsge -f /tmp/gram_cache_cleanupHlsNAB -c
cache_cleanup
# ps xuawww | grep "globus-job-manager -conf
/opt/globus/etc/globus-job-manager.conf -type lcgsge -rdn
jobmanager-lcgsge -machine-type unknown -publish-jobs" | wc -l
1371
Those processes are either reading and/or writting from/to /tmp, and
this is the cause of a huge I/O wait because there is a huge number of
files there:
# ls /tmp/ | wc -l
219334
# time ls /tmp
[...]
real 1m16.023s
user 0m4.510s
sys 0m2.650s
Most of the files there are proxy files like:
-rw------- 1 augerprd029 augerprd 9666 May 8 13:33
x509up_p29181.fileX0mXTi.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:34
x509up_p31464.fileOhpaer.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:35 x509up_p1119.filerILwwE.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:36 x509up_p3812.filekRSuZw.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:37 x509up_p5592.fileLNp4Ca.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:38 x509up_p8286.filenyK0BN.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:39
x509up_p10240.filelvkesl.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:42
x509up_p14560.fileDFg26K.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:43
x509up_p15658.filegSSzAz.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:44
x509up_p19384.file8honnB.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:45
x509up_p20809.file6uANvq.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:46
x509up_p23031.fileFAfCd9.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:47
x509up_p24797.filesZ0FdJ.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:48
x509up_p26241.file7kkaNq.1
-rw------- 1 augerprd029 augerprd 9670 May 8 13:49
x509up_p27685.fileBWeeQf.1
-rw------- 1 augerprd029 augerprd 9666 May 8 13:52
x509up_p31281.filetMYU9p.1
# grep x509 lala | wc -l
191695
The problem is that these are not old files. The oldest ones are from
May 8th:
# openssl x509 -text -noout -in /tmp/x509up_p29181.fileX0mXTi.1
[...]
Not Before: May 8 12:28:40 2012 GMT
Not After : May 8 22:00:23 2012 GMT
I do not understand why the middleware did not delete them yet or if
this is a problem in the Auger submission chain.
Help is appreciated
Cheers
Goncalo
|