Hello everybody,
I've installed a CREAM CE as a grid frontend to a (shared) lsf cluster,
where the lsf master is on another host. However I can't get the
dynamic information publishing to produce anything but zeros for the
number of free and available jobs slots/CPUs. I can manually run the
dynamic plugin without error (and the fact that our WaitingJobs are 0
and not 444444 leads me to believe that somethings working):
#/opt/lcg/libexec/lcg-info-dynamic-scheduler -c
/opt/glite/etc/lcg-info-dynamic-scheduler.conf
and looking at the contents of lcg-info-dynamic-scheduler.conf I have:
# cat /opt/glite/etc/lcg-info-dynamic-scheduler.conf
[Main]
static_ldif_file: /opt/glite/etc/gip/ldif/static-file-CE.ldif
vomap :
ops:ops
atlas:atlas
dteam:dteam
sgmdteam:dteam
sgmatlas:atlas
prdatlas:atlas
pltatlas:atlas
pltatlas:atlas
sgmops:ops
module_search_path : ../lrms:../ett
[LRMS]
lrms_backend_cmd: /opt/glite/libexec/lrmsinfo-lsf
[Scheduler]
cycle_time : 0
#
Continuing to follow this trail I tried running lrmsinfo-lsf and got:
# /opt/glite/libexec/lrmsinfo-lsf
nactive 0
nfree 0
now 1291295415
schedCycle 120
#
Job submission to this new ce works, as do all the lsf tools for
querying the batch system. The endpoint for the cream CE's bdii is:
ldapsearch -x -H ldap://abaddon.hec.lancs.ac.uk:2170 -b
mds-vo-name=resource,o=grid
I've paste my static CE ldif below as the error could be in there, it
was largely created by yaim. Any help would be very much appreciated.
Thanks in advance,
Matt
Lancaster Grid Admin
# cat /opt/glite/etc/gip/ldif/static-file-CE.ldif
dn:
GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal,mds-vo-name=resource,o=grid
objectClass: GlueCETop
objectClass: GlueCE
objectClass: GlueCEAccessControlBase
objectClass: GlueCEInfo
objectClass: GlueCEPolicy
objectClass: GlueCEState
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueCEHostingCluster: abaddon.hec.lancs.ac.uk
GlueCEName: normal
GlueCEUniqueID: abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal
GlueCEImplementationName: CREAM
GlueCEImplementationVersion: 32
GlueCECapability: CPUScalingReferenceSI00=1
GlueCEInfoGatekeeperPort: 8443
GlueCEInfoHostName: abaddon.hec.lancs.ac.uk
GlueCEInfoLRMSType: LSF
GlueCEInfoLRMSVersion: not defined
GlueCEInfoJobManager: lsf
GlueCEInfoContactString:
https://abaddon.hec.lancs.ac.uk:8443/ce-cream/services
GlueCEInfoApplicationDir: /opt/voapps
GlueCEInfoDataDir: unset
GlueCEInfoDefaultSE: fal-pygrid-30.lancs.ac.uk
GlueCEInfoTotalCPUs: 256
GlueCEStateEstimatedResponseTime: 2146660842
GlueCEStateRunningJobs: 0
GlueCEStateStatus: Production
GlueCEStateTotalJobs: 0
GlueCEStateWaitingJobs: 4446444
GlueCEStateWorstResponseTime: 2146660842
GlueCEStateFreeJobSlots: 0
GlueCEStateFreeCPUs: 0
GlueCEPolicyMaxCPUTime: 999999999
GlueCEPolicyMaxObtainableCPUTime: 999999999
GlueCEPolicyMaxRunningJobs: 999999999
GlueCEPolicyMaxWaitingJobs: 999999999
GlueCEPolicyMaxTotalJobs: 999999999
GlueCEPolicyMaxWallClockTime: 999999999
GlueCEPolicyMaxObtainableWallClockTime: 999999999
GlueCEPolicyPriority: 1
GlueCEPolicyAssignedJobSlots: 10
GlueCEPolicyMaxSlotsPerJob: 999999999
GlueCEPolicyPreemption: 0
GlueCEAccessControlBaseRule: VO:atlas
GlueCEAccessControlBaseRule: VO:dteam
GlueCEAccessControlBaseRule: VO:ops
GlueForeignKey: GlueClusterUniqueID=abaddon.hec.lancs.ac.uk
GlueInformationServiceURL:
ldap://abaddon.hec.lancs.ac.uk:2170/mds-vo-name=resource,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
dn:
GlueVOViewLocalID=atlas,GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal,mds-vo-name=resource,o=grid
objectClass: GlueCETop
objectClass: GlueVOView
objectClass: GlueCEInfo
objectClass: GlueCEState
objectClass: GlueCEAccessControlBase
objectClass: GlueCEPolicy
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueVOViewLocalID: atlas
GlueCEAccessControlBaseRule: VO:atlas
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 444444
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 0
GlueCEStateEstimatedResponseTime: 2146660842
GlueCEStateWorstResponseTime: 2146660842
GlueCEInfoDefaultSE: fal-pygrid-30.lancs.ac.uk
GlueCEInfoApplicationDir: /opt/voapps/atlas
GlueCEInfoDataDir: unset
GlueChunkKey: GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
dn:
GlueVOViewLocalID=dteam,GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal,mds-vo-name=resource,o=grid
objectClass: GlueCETop
objectClass: GlueVOView
objectClass: GlueCEInfo
objectClass: GlueCEState
objectClass: GlueCEAccessControlBase
objectClass: GlueCEPolicy
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueVOViewLocalID: dteam
GlueCEAccessControlBaseRule: VO:dteam
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 444444
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 0
GlueCEStateEstimatedResponseTime: 2146660842
GlueCEStateWorstResponseTime: 2146660842
GlueCEInfoDefaultSE: fal-pygrid-30.lancs.ac.uk
GlueCEInfoApplicationDir: /opt/voapps/dteam
GlueCEInfoDataDir: unset
GlueChunkKey: GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
dn:
GlueVOViewLocalID=ops,GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal,mds-vo-name=resource,o=grid
objectClass: GlueCETop
objectClass: GlueVOView
objectClass: GlueCEInfo
objectClass: GlueCEState
objectClass: GlueCEAccessControlBase
objectClass: GlueCEPolicy
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueVOViewLocalID: ops
GlueCEAccessControlBaseRule: VO:ops
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 444444
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 0
GlueCEStateEstimatedResponseTime: 2146660842
GlueCEStateWorstResponseTime: 2146660842
GlueCEInfoDefaultSE: fal-pygrid-30.lancs.ac.uk
GlueCEInfoApplicationDir: /opt/voapps/ops
GlueCEInfoDataDir: unset
GlueChunkKey: GlueCEUniqueID=abaddon.hec.lancs.ac.uk:8443/cream-lsf-normal
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
|