Hi Alessandra,
thanks, good idea. It looks ok to me:
Still analysis jobs are failing at too high a rate but I am not sure why.
The particular file which was breaking the HC tests is now accessible again and then logs no longer show errors.
I can fetch is using gfal-utils.
what about the host certificate? Does that correspond to storage045? Have you tried also other tools like gfal-copy/gfal-ls in verbose mode?181231 03:30:16 16324 platl017.32294:25@node134 ofs_open: 0-600 fn=/dpm/ppgrid1.rhul.ac.uk/home/atlas/atlasdatadisk/rucio/mc15_13TeV/ed/68/AOD.05536542._000001.pool.root.1181231 03:30:16 16324 dpmdiskacc_Access: Disk server hostname storage045.ppgrid1.rhul.ac.uk not matched to this host.181231 03:30:16 16324 ofs_open: platl017.32294:25@node134 Unable to open /dpm/ppgrid1.rhul.ac.uk/home/atlas/atlasdatadisk/rucio/mc15_13TeV/ed/68/AOD.05536542._000001.pool.root.1; permission denied181231 03:30:16 16324 platl017.32294:25@node134 ofs_close: use=0 fn=dummy181231 03:30:17 16324 XrootdXeq: platl017.32294:25@node134 disc 0:00:01
(my highlighting)
But is it storage045!
[root@storage045(ppgrid1) ~]# hostnamestorage045.ppgrid1.rhul.ac.uk[root@storage045(ppgrid1) ~]# grep $(hostname) /etc/hosts134.219.225.232 storage045.ppgrid1.rhul.ac.uk storage045.ppgrid1[root@storage045(ppgrid1) ~]# host $(hostname)storage045.ppgrid1.rhul.ac.uk has address 134.219.225.232[root@storage045(ppgrid1) ~]# host 134.219.225.232232.225.219.134.in-addr.arpa domain name pointer storage045.ppgrid1.rhul.ac.uk.
[root@storage045(ppgrid1) ~]# ifconfig p2p1.2p2p1.2 Link encap:Ethernet HWaddr A0:36:9F:2C:8F:7Cinet addr:134.219.225.232 Bcast:134.219.225.255 Mask:255.255.255.0inet6 addr: fe80::a236:9fff:fe2c:8f7c/64 Scope:LinkUP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1RX packets:380221630 errors:0 dropped:0 overruns:0 frame:0TX packets:266028510 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:0RX bytes:3500942446699 (3.1 TiB) TX bytes:399922990727 (372.4 GiB)
Anyone know what this means?
From: Testbed Support for GridPP member institutes <[log in to unmask]> on behalf of Elena Korolkova <[log in to unmask]>
Sent: 30 December 2018 22:23
To: [log in to unmask]
Subject: Re: HC failures, DPM problem?Hi Simon,
there are failed (68) and finished (25) jobs during last 12 h.
there is a problem to open a file:
https://aipanda167.cern.ch/media/filebrowser/a84d70c5-5eb3-4dc4-a879-545f6a656734/user.gangarbt/tarball_PandaJob_4196806388_ANALY_RHUL_SL6/athena_stdout.txt
ERROR [ERROR] Server responded with an error: [3010] Unable to open /dpm/ppgrid1.rhul.ac.uk/home/atlas/atlasdatadisk/rucio/mc15_13TeV/ed/68/AOD.05536542._000001.pool.root.1; permission denied.
Could you please check permissions for other files as well.
Elena
On 30 Dec 2018, at 18:54, Jeremy Coles <[log in to unmask]> wrote:
> Hi Simon,
>
> No immediate technical suggestions from me, but (since limited numbers of people are online presently and it could be a HC issue) you could also seek help from:
>
>> ATLAS UK Cloud Support <[log in to unmask]>
>
> or
>
>> "atlas-adc-hammercloud-support (ATLAS ADC HammerCloud Support)" <[log in to unmask]>
>
> Best regards,
> Jeremy
>
>
>
>
>> On 30 Dec 2018, at 16:53, George, Simon <[log in to unmask]> wrote:
>>
>> If anyone is working at the moment, I'd appreciate some help.
>> I found that ANALY_RHUL_SL6 is blacklisted due to HC test failures:
>> http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=UKI-LT2-RHUL&startTime=2018-12-17&endTime=2018-12-31&templateType=isGolden
>> The site was down for a few days before xmas due to a PDU problem taking down a critical network switch which put several key servers offline (various lessons being learned...)
>> Since coming back up everything looks to be up from my side but HC says otherwise. It looks like a problem with DPM but I cannot find what is wrong, all the services are running, just some actions fail. When I try by hand to retrieve a file via webdav that an HC test failed to access, I could download it no problem.
>> Any suggestions please, what would be worth checking?
>> Thanks,
>> Simon
>>
>> To unsubscribe from the TB-SUPPORT list, click the following link:
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
>
> To unsubscribe from the TB-SUPPORT list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
>
########################################################################
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
-- Respect is a rational process. \\// For Ur-Fascism, disagreement is treason. (U. Eco)
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1
To unsubscribe from the TB-SUPPORT list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/webadmin?SUBED1=TB-SUPPORT&A=1