Hi John,
it is because one experiment doesn't want the latest version
that the others are held back and have to distribute middleware with
their software. That makes debugging a pain since you cannot tell which
version any job is using.
from what I could gather only SA3 wanted the multiple versions,
experiments never asked for it. That was exactly my point on experiments
dealing with their users: it shouldn't be difficult for experiments to
know what client they are using.
I think we should be recommending upgrades at a GridPP level and not just leaving it to each site who happens to get a request from a random user.
I didn't talk about random users. They are normally official requests or
come from well known people in the experiments. I agree with you that we
could collect these requests and organise an upgrade schedule to a
minimum common version to be reviewed every 6 months at the Deployment
Board. I would be happy with that.
cheers
alessandra
Gordon, JC (John) wrote:
> Alessandra,
>
>
> John
>
>
>> -----Original Message-----
>> From: Testbed Support for GridPP member institutes [mailto:TB-
>> [log in to unmask]] On Behalf Of Alessandra Forti
>> Sent: 22 July 2009 09:18
>> To: [log in to unmask]
>> Subject: Re: New DPM for SAM causes failures at many UKI sites
>>
>> Hi John,
>>
>> I don't see the situation so dramatic the sites in question have been
>> working up to now at least as far as lcc_utils is concerned. I haven't
>> received any complain from the experiments, if I had I would have done
>> something earlier. We will do something now.
>>
>> The support of multiple copies is another question independent from
>>
> the
>
>> upgrade. If you look at it it could be the opposite: consider how
>> poorly
>> backward compatible the software is often - as in this case - the
>> experiments might not want the latest version. The main reason
>>
> multiple
>
>> version is not a good solution is because experiments can copy clients
>> in their areas, they have a finer control and they can deal with their
>> users in much better way without involving sites.
>>
>> cheers
>> alessandra
>>
>> PS From Maarten email CERN itself is being "forced" to upgrade the SAM
>> machines from SLC3/glite3.0.
>>
>> Gordon, JC (John) wrote:
>>
>>> Jeremy, the issue is not how old the release that these sites are
>>>
>> running is but when it was superseded in the WLCG minimum release
>> definition. Was that a year ago? If so then some GridPP sites have
>> questions to answer. If not then it is a different discussion.
>>
>>> Some experiments have been asking for more recent gfal and lcg-utils
>>>
>> for a long time. There have been various suggestions at the GDB about
>> how sites could support multiple releases. UK sites have been negative
>> about them all and it looks like we are getting bit now. If sites are
>> against multiple release support then surely there is a responsibility
>> to keep reasonably up to date?
>>
>>> John
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Testbed Support for GridPP member institutes [mailto:TB-
>>>> [log in to unmask]] On Behalf Of Coles, J (Jeremy)
>>>> Sent: 21 July 2009 13:55
>>>> To: [log in to unmask]
>>>> Subject: New DPM for SAM causes failures at many UKI sites
>>>>
>>>> Dear All
>>>>
>>>> I've been made aware of an issue that affects many UKI sites
>>>>
> running
>
>>>> older versions of GFAL and lcg_utils. Specific sites in UKI that
>>>>
>> should
>>
>>>> read the following carefully are:
>>>>
>>>> RAL-LCG2 lcgce02.gridpp.rl.ac.uk;
>>>> lcgce03.gridpp.rl.ac.uk;
>>>>
> lcgce04.gridpp.rl.ac.uk;
>
>>>> lcgce05.gridpp.rl.ac.uk
>>>> UKI-LT2-IC-HEP ce00.hep.ph.ic.ac.uk
>>>> UKI-LT2-RHUL ce1.pp.rhul.ac.uk
>>>> UKI-LT2-UCL-HEP lcg-ce01.hep.ucl.ac.uk
>>>> UKI-NORTHGRID-MAN-HEP ce01.tier2.hep.manchester.ac.uk
>>>> UKI-NORTHGRID-MAN-HEP ce02.tier2.hep.manchester.ac.uk
>>>> UKI-SCOTGRID-ECDF ce.glite.ecdf.ed.ac.uk
>>>> UKI-SCOTGRID-ECDF mw05.ecdf.ed.ac.uk
>>>>
>>>> There is likely to be an urgent request to upgrade in the coming
>>>>
>> days.
>>
>>>> We can review the situation at the UKI meeting on Thursday
>>>> (http://indico.cern.ch/conferenceDisplay.py?confId=64531).
>>>>
>>>> If you are aware of reasons an upgrade is not possible for GFAL and
>>>> lcg_utils then please reply to the list.
>>>>
>>>> Many thanks,
>>>> Jeremy
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Maarten Litmaath <[log in to unmask]>
>>>> Date: Mon, Jul 20, 2009 at 9:24 PM
>>>> Subject: new DPM for SAM causes failures at 10% of the sites !!
>>>>
>>>>
>>>> Hi all,
>>>> the SAM CE tests include the replication of a file from the site's
>>>> default SE for "ops" to a central SE, for which lxdpm104.cern.ch
>>>> is the default choice and thereby almost always used.
>>>>
>>>> The problem with lxdpm104 is its OS: it still runs SLC3, which is
>>>> no longer supported. Tony Cass is not happy with this situation.
>>>>
>>>> We already upgraded the spare nodes lxdpm101 and lxdpm103 to SLC4
>>>> and the latest DPM version for gLite 3.1.
>>>>
>>>> lxdpm101 is used as the central SE for the SAM validation instance,
>>>> where we see about 10% more of the sites failing, compared to the
>>>> production instance. There are some big sites included.
>>>>
>>>> Production:
>>>>
>>>> https://lcg-
>>>>
>>>>
> sam.cern.ch:8443/sam/sam.py?sensors=CE®ions=AsiaPacific®ions=CERN
>
> ®ions=CentralEurope®ions=France®ions=GermanySwitzerland®ion
>
> s=Italy®ions=NorthernEurope®ions=Russia®ions=SouthEasternEurop
>
> e®ions=SouthWesternEurope®ions=UKI&vo=ops&order=RegionName&funct=
>
>>>> ShowSensorTests
>>>>
>>>> Validation:
>>>>
>>>> https://sam-
>>>>
>>>>
> val.cern.ch:8443/sam/sam.py?sensors=CE®ions=AsiaPacific®ions=CERN
>
> ®ions=CentralEurope®ions=France®ions=GermanySwitzerland®ion
>
> s=Italy®ions=NorthernEurope®ions=Russia®ions=SouthEasternEurop
>
> e®ions=SouthWesternEurope®ions=UKI&vo=ops&order=RegionName&funct=
>
>>>> ShowSensorTests
>>>>
>>>> As I am writing this, production has 342 green CEs, validation 307.
>>>>
>>>> There are many failures in particular in Italy and UKI.
>>>> The error usually is as follows:
>>>>
>>>>
>>>>
> --------------------------------------------------------------------
>
>> ---
>>
>>>> --
>>>> Both SAPath and SARoot are not set about ops VO and SE :
>>>> lxdpm101.cern.ch
>>>> lcg_rep: Invalid argument
>>>>
>>>>
> --------------------------------------------------------------------
>
>> ---
>>
>>>> --
>>>>
>>>> The cause of that error becomes clear when we look at the versions
>>>>
>> of
>>
>>>> GFAL and lcg_utils that are present on the WN. For example, at
>>>>
> RAL:
>
>>>>
> --------------------------------------------------------------------
>
>> ---
>>
>>>> --
>>>> Using lcg-utils version:
>>>>
>>>> + lcg-cp --version
>>>> lcg_util-1.6.11
>>>> GFAL-client-1.10.11
>>>>
>>>>
> --------------------------------------------------------------------
>
>> ---
>>
>>>> --
>>>>
>>>> That version is more than a year old and cannot handle the way
>>>> lxdpm101.cern.ch now is published in the info system (I verified
>>>>
>> that).
>>
>>>> Conclusion: I think all sites that fail in the SAM validation
>>>>
>> instance
>>
>>>> need to be told to upgrade their WNs to the latest version
>>>>
>> _URGENTLY_.
>>
>>>> We should give them a deadline by which time we switch the
>>>>
>> production
>>
>>>> instance to lxdpm101.cern.ch.
>>>> Thanks,
>>>> Maarten
>>>>
>>>>
>>>>
>>>> --
>>>> Steve Traylen
>>>> --
>>>> Scanned by iCritical.
>>>>
>>>>
>> --
>> No man ever steps in the same river twice, for it's not the same river
>> and he's not the same man. (Heraclitus)
>>
>> Northgrid Tier2 Technical Coordinator
>> http://www.hep.manchester.ac.uk/computing/tier2
>>
--
No man ever steps in the same river twice, for it's not the same river and he's not the same man. (Heraclitus)
Northgrid Tier2 Technical Coordinator
http://www.hep.manchester.ac.uk/computing/tier2
|