Hello Doctor, thanks for helping!
> At a quick look the difference is a missing subcluster:
Yes. But only on lcgbdii - from the old SL4 site-bdii lcgce01, all 3 are
found.
So is it true that old SL4 site-bdii has, say, cached this info?
(Assuming that lcgce02 has a problem in publishing)
Whereas the new SL5 non-prod site-bdii can't get the info from lcgce02?
> Do you have any error messages in the bdii logs?
Yes. As Daniela very kindly pointed out,
> On you new bdii lcgce02 seems to have lost its information about the
> installed software releases.
> Pick a random one (I tried CMSSW_3_3_6_patch4) and search for it.
and lcgbdii's /var/log/bdii-update.log is noting similarly:
2010-03-11 11:38:21,364: [WARNING] dn:
gluelocationlocalid=vo-cms-cmssw_3_3_6_patch2,gluesubclusteruniqueid=lcgce02.phy.bris.ac.uk
,glueclusteruniqueid=lcgce02.phy.bris.ac.uk,mds-vo-name=uki-southgrid-bris-hep,o=grid
2010-03-11 11:38:21,364: [WARNING] ldapadd: No such object (32)
2010-03-11 11:38:21,365: [WARNING] matched DN:
GlueClusterUniqueID=lcgce02.phy.bris.ac.uk,Mds-Vo-name=UKI-SOUTHGRID-BRIS-HEP,o=grid
googling for bdii & "No such object" points to possibly missing
glite-info-provider-service (lcgce02 is.... behind on updates)
> without the subcluster info job submission will fail.
Strangely enough, lcgce02 passes most OPS SAM tests, & until yesterday all
CMS SAM tests - and our prod site-bdii is still the old SL4 lcgce01; but
LHCB tests have been intermittently (only!) LISTMATCHFAILED starting in
Dec'09 & then went solid LISTMATCHFAILED in January. But lhcb user jobs do
arrive & run - far as I know it's just LHCb SAM jobs doing LISTMATCHFAILED.
So that sounds like the problem is lcgce02 not lcgbdii.
This ce02 has carefully hand-made .ldif files since what yaim writes
doesn't work for a CE sending to remote torque/maui - Or so Yves Coppens,
Jon Wakelin & I found out when deploying it.
Don't other sites with a CE feeding to remote torque/maui (HPC) have to
post-yaim correct yaim-written .ldif files?
It looks like the right info is there:
root@lcgce02> grep GlueSubClusterUniqueID *ldif
static-file-Cluster.ldif:dn: GlueSubClusterUniqueID=lcgce02.phy.bris.ac.uk,
GlueClusterUniqueID=lcgce02.phy.bris.ac.uk,mds-vo-name=resource,o=grid
static-file-Cluster.ldif:GlueSubClusterUniqueID: lcgce02.phy.bris.ac.uk
It seems ++agro for CE with SL4 HPC WN; but LHCb still want to use SL4.
:-/
|