Hello,
(sending to the right list this time)
Having a go at some manual gfal copies before I forget:
gfal-copy -vvv matttest.txt
https://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/dteam/matttest.txt
Failed with authentication issues for my dteam proxy! It seems to do
this with all the protocols I tried. Switching to a gridpp proxy works
in that it reproduces a similar error to the tests (HTTP 405 : Method
Not Allowed, Permission refused). Interestingly a namespace entry is
created, but it looks like no corresponding disk replica is written.
I noticed a few other behaviours. Clicking around the SE in a browser
(https://t2se01.physics.ox.ac.uk/dpm/physics.ox.ac.uk/home/gridpp/) I
can't "drill down" to any file, even ones I just created using other
protocols.
Also http deletions do not seem work either - always having a "MISSING"
error as if the file didn't exist.
Each time I spotted something weird I double checked that I could get
the correct behaviour from other protocols.
I'm afraid all I've managed to do is come up with a list of symptoms and
no solutions. Perhaps if no one on this list has any insight it's time
to contact the dpm users forum? I am very surprised that atlas haven't
had issues with deletions at Oxford yet if other users are getting the
same behaviour I saw.
Sorry I couldn't be more help,
Matt
On 07/08/17 16:43, Matt Doidge wrote:
> Hello,
> I second doing some manual gfal copies to giev a better picture of
> what's going on.
>
> This may be clutching at straws, but I remember seeing a similar problem
> at Lancaster a while back. IIRC the fix was to add the "Write" flag to
> the NSFlags and DiskFlags lines in /etc/httpd/conf.d/zlcgdm-dav.conf
>
> NSFlags Write RemoteCopy
>
> DiskFlags Write RemoteCop
>
>
> Cheers,
> Matt
>
> On 07/08/17 15:40, RAUL H C LOPES wrote:
>> A gfal-copy test from your UI might show if the error is in the head
>> node or some disk server. Also I would check if httpd is running in
>> all disk nodes.
>>
>> raul
>>
>> On 07/08/17 15:30, Peter Gronbech wrote:
>>>
>>> I’m still getting errors,
>>>
>>> In particular it’s failing to do the PUT file.
>>>
>>> It talks to the head node t2se01, the in the case of the test at 14:04
>>>
>>> (http://wlcg-sam-atlas.cern.ch/dashboard/request.py/metricOutput?host=t2se01.physics.ox.ac.uk&time=2017-08-07T14:04:18Z&metricfqan=webdav.HTTP-All%20(_atlas_Role_production)
>>> <http://wlcg-sam-atlas.cern.ch/dashboard/request.py/metricOutput?host=t2se01.physics.ox.ac.uk&time=2017-08-07T14:04:18Z&metricfqan=webdav.HTTP-All%20%28_atlas_Role_production%29>
>>> )
>>>
>>> Fails to put the file on to the pool node t2se24:
>>>
>>> <= <title>405 Method Not Allowed</title>
>>>
>>> <= </head><body>
>>>
>>> <= <h1>Method Not Allowed</h1>
>>>
>>> <= <p>The requested method PUT is not allowed for the URL
>>> /dpm/pool3/atlas/2017-08-07/test_webdav_access_60.179555325.0.</p>
>>>
>>> <= <hr>
>>>
>>> <= <address>Apache/2.2.15 (Scientific Linux) Server at
>>> t2se24.physics.ox.ac.uk Port 80</address>
>>>
>>> What is the next thing to try?
>>>
>>> Thanks Pete
>>>
>>> --
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Peter GronbechGridPP Project ManagerTel No. : 01865 273389
>>>
>>> Department of Particle Physics,
>>>
>>> University of Oxford,
>>>
>>> Keble Road, OxfordOX1 3RH, UKE-mail : [log in to unmask]
>>> <mailto:[log in to unmask]>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> *From:*GRIDPP2: Deployment and support of SRM and local storage
>>> management [mailto:[log in to unmask]] *On Behalf Of *Sam
>>> Skipsey
>>> *Sent:* 07 August 2017 13:30
>>> *To:* [log in to unmask]
>>> *Subject:* Re: DPM Help Webdav
>>>
>>> hi Pete:
>>>
>>> If you're running a pre 1.9.x DPM, then that's quite possibly one of
>>> the httpd bugs which were fixed in 1.9.x (they mostly related to
>>> issues with the http daemon locking things up, or locking itself up,
>>> due to file leaks - inability to restart httpd can be a symptom).
>>>
>>> Sam
>>>
>>> On Mon, 7 Aug 2017 at 10:53 John Hill <[log in to unmask]
>>> <mailto:[log in to unmask]>> wrote:
>>>
>>> Hi Pete,
>>> We've seen these processes from time to time (both on the SE
>>> and
>>> pool nodes). Stopping them manually and restarting httpd usually
>>> sorts
>>> things out. I don't think a new kernel should of itself cause any
>>> problems, but rebooting the SE after 9 months uptime is always
>>> potentially risky (for example: are there any manual changes which
>>> won't
>>> be picked up at boot time?).
>>>
>>> Cheers,
>>> John
>>>
>>> On 07/08/2017 10:12, Peter Gronbech wrote:
>>> > Hi All,
>>> >
>>> > Oxford is failing the
>>>
>>> http://wlcg-sam-atlas.cern.ch/templates/ember/#/plot?group=ATLAS_Cloud_UK&profile=ATLAS_HTTP
>>>
>>> > The (DPM) se head node has not run out of space and the load is
>>> not high.
>>> > As I'm pretty rusty I'm inclined to just give it a reboot as
>>> it's been up 283 days.
>>> >
>>> > This may bring it up with a new Kernel and could cause new
>>> problems of course, any suggestions?
>>> >
>>> > I can see 4 of the following processes
>>> > /usr/sbin/httpd.event -k graceful
>>> >
>>> > Which look like it's failing to restart httpd, (There is a cron
>>> job that looks like it restarts it every 6 hours)
>>> >
>>> > Thanks Pete
>>> >
>>>
>>
|