Hi,
I've just built three new systems. For an hour "service cvmfs probe"
hung when querying mice repo:
--------------------------------------------------------
grep mice /var/log/messages
Jul 2 15:11:26 r16-n06 cvmfs2: (mice.gridpp.ac.uk) switch proxy / retry
on
http://cernvmfs.gridpp.rl.ac.uk:8000/opt/mice.gridpp.ac.uk/data/3e/d3129b84409db0346dcf899b95271287cf581dC
Jul 2 15:11:48 r16-n06 cvmfs2: (mice.gridpp.ac.uk) switch proxy / retry
on
http://cernvmfs.gridpp.rl.ac.uk:8000/opt/mice.gridpp.ac.uk/data/3e/d3129b84409db0346dcf899b95271287cf581dC
Jul 2 15:12:17 r16-n06 cvmfs2: (mice.gridpp.ac.uk) unable to load
catalog from /data/3e/d3129b84409db0346dcf899b95271287cf581dC, going to
offline mode
Jul 2 15:12:17 r16-n06 cvmfs2: (mice.gridpp.ac.uk) possible data
corruption while trying to retrieve catalog from
http://cernvmfs.gridpp.rl.ac.uk:8000/opt/mice.gridpp.ac.uk, trying with
no-cache
--------------------------------------------------------
Then cvmfs on all three new machines sprung into life:
--------------------------------------------------------
Jul 2 16:28:55 r16-n06 cvmfs2: (mice.gridpp.ac.uk) Signed catalog
loaded from
http://cernvmfs.gridpp.rl.ac.uk:8000/opt/mice.gridpp.ac.uk;http://cvmfs-stratum-one.cern.ch:8000/opt/mice.gridpp.ac.uk,
signed by Publisher: /CN=mice.gridpp.ac.uk CernVM-FS Release
Managers#012Certificate issued by: /CN=mice.gridpp.ac.uk CernVM-FS
Release Managers
Jul 2 16:28:55 r16-n06 cvmfs2: (mice.gridpp.ac.uk) CernVM-FS: linking
/cvmfs/mice.gridpp.ac.uk to remote directory
http://cernvmfs.gridpp.rl.ac.uk:8000/opt/mice.gridpp.ac.uk;http://cvmfs-stratum-one.cern.ch:8000/opt/mice.gridpp.ac.uk
--------------------------------------------------------
Dunno why - I did nothing to fix it. "service cvmfs probe" on existing
systems worked fine right through. Weird.
CVMFS client: cvmfs-2.0.18-1.el5.x86_64
Steve
On 07/02/2013 10:33 AM, Ian Collier wrote:
> Hi,
>
> As you may have noticed, our Stratum 1 (cernvmfs.gridpp.rl.ac.uk) is back on line.
>
> We replaced the back end storage and everything is up to date.
>
> It seems that failover was not completely transparent at all sites. What /should/ have happened is that as soon as anything wa unavailable from the RAL Stratum 1 the client should just move on to the next one. We know that did not happen everywhere. It would be great to gather a bit more data.
>
> So, if you did see problems could you get in touch with details (and ideally a bug report tar ball from an affected node.
>
> Thanks,
>
> --Ian
>
> On 26 Jun 2013, at 16:42, Gareth Smith <[log in to unmask]> wrote:
>
>> Hi,
>>
>> The problem looks like a hardware problem on the backend storage for our
>> Stratum 1.
>>
>> We have turned off http on the Stratum 1 machine. This should cause a
>> failover to alternatives - apart from for those small VOs for which there
>> isn't one.
>>
>> More details to follow.
>>
>> Gareth
>>
--
Steve Jones [log in to unmask]
System Administrator office: 220
High Energy Physics Division tel (int): 42334
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 2334
University of Liverpool http://www.liv.ac.uk/physics/hep/
|