If its on down time???
dteam001 17291 0.1 0.7 5456 3560 ? S 14:36 0:00
globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type fork
-rdn
dteam001 17310 0.0 0.5 4292 2712 ? S 14:36 0:00 perl
/home/dteam001/.globus/.gass_cache/local/md5/5e/5b8edfc10870103c817548b2594
dteam001 17311 0.2 1.3 7888 6376 ? S 14:36 0:00 perl
/tmp/grid_manager_monitor_agent.dteam001.17310.1000 --delete-self --maxtime
If a good reason is not given within one hour these processes will be killed
The site has already been upgraded to LCG2_4_0, and of course is giving
problems
yes , I will put it in ggus in a awhile...
So I am trying to figure out why my ce gives a lot of 0's
GlueCEInfoHostName: ce01.lip.pt
GlueCEInfoLRMSType: torque
GlueCEInfoLRMSVersion: not defined
GlueCEInfoTotalCPUs: 0
GlueCEStateEstimatedResponseTime: 0
GlueCEStateFreeCPUs: 0
GlueCEStateRunningJobs: 0
GlueCEStateStatus: Production
GlueCEStateTotalJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateWorstResponseTime: 0
GlueCEPolicyMaxCPUTime: 0
GlueCEPolicyMaxRunningJobs: 0
GlueCEPolicyMaxTotalJobs: 0
GlueCEPolicyMaxWallClockTime: 0
although when I run in my ce
/opt/lcg/libexec/lcg-info-dynamic-ce
gives correctly
GlueCEInfoLRMSVersion: torque_1.0.1p5
GlueCEInfoTotalCPUs: 8
GlueCEStateFreeCPUs: 8
GlueCEPolicyMaxRunningJobs: 8
GlueCEPolicyMaxCPUTime: 4320
GlueCEPolicyMaxWallClockTime: 9000
GlueCEStateStatus: Production
etc.
by the way
/opt/lcg/libexec/lcg-info-dynamic-ce
does not have eXecute permission initially (check config_gip ?)
after I figure this one out I will start to look at the next little problem
[root@ce01 tmp]# /etc/init.d/edg-rgma-gin status
edg-rgma-gin is stopped
[root@ce01 tmp]# /etc/init.d/edg-rgma-gin stop
Stopping edg-rgma-gin: [ OK ]
[root@ce01 tmp]#
[root@ce01 tmp]#
[root@ce01 tmp]#
[root@ce01 tmp]# /etc/init.d/edg-rgma-gin start
Starting edg-rgma-gin: [FAILED]
For more details check /opt/edg/var/log/rgma-gin.log
cron will attempt a restart within the next hour
to disable restart run edg-rgma-gin stop
cheers
Mario
|