Hello,
> Is there anything in the log.
Nothing exciting to my eyes. After the restarts things seem cleaner, but
before there are just the occasional N2N errors that I believe are red
herrings.
> You could put the xrootd.log from fedredir somewhere I can see it.
http://www.hep.lancs.ac.uk/~msd/lancs_18nov_fedredir_atlas_xrootd.log
I restarted the xrootd daemon at about 16.30 to try to catch the
difference in running before and after a restart.
>
> Also is there anything in /var/log/messages - like saying it segfaulted or something
> are you using the rucioPrefix argument or sitename for the N2N plugin library
>
The syslog and dmesg are all thankfully(?) devoid of segfaults or
similar errors.
I believe we're using the recioPrefix arguement:
dpm.namelib XrdOucName2NameLFC.so root=/dpm/lancs.ac.uk/home/atlas
match=fal-pygrid-30.lancs.ac.uk pssorigin=localhost
sitename=UKI-NORTHGRID-LANCS-HEP
rucioprefix=/dpm/lancs.ac.uk/home/atlas/atlasgroupdisk/phys-beauty,/dpm/lancs.ac.uk/home/atlas/atlaslocalgroupdisk,/dpm/lancs.ac.uk/home/atlas/atlasgroupdisk/soft-test,/dpm/lancs.ac.uk/home/atlas/atlasscratchdisk,/dpm/lancs.ac.uk/home/atlas/atlasproddisk,/dpm/lancs.ac.uk/home/atlas/atlasgroupdisk/phys-top,/dpm/lancs.ac.uk/home/atlas/atlashotdisk,/dpm/lancs.ac.uk/home/atlas/atlasdatadisk
Thanks again for the help,
Matt
> cheers
>
> Wahid
>
> On 18 Nov 2013, at 16:10, Matt Doidge <[log in to unmask]> wrote:
>
>> Hello,
>>> So now (or indeed maybe since last week) Lancaster is failing in the SSB test -now even the direct test. Is there something wrong on the lancaster side.
>>>
>>> Sheffield is also failing there .
>>> In both cases the local transfers are working fine so it is just fedredir
>>>
>>> Let me know if you need any assistance
>>
>> Adding the missing parameter as Sam suggested on Friday does indeed fix things for Lancaster...
>> ...for a little while after restarting the xrootd service. And then it stops working (it works for at least a few minutes afterwards, we don't have statistics on how long it takes to break).
>>
>> Another restart of the service brings things back to life for a bit. But just for a bit. Maybe these are the xrootd stability problems Raul mentioned punching Lancaster (and Sheffield) in the redirector?
>>
>> Any assistance would be appreciated!
>>
>> Cheers,
>> Matt
>>
>>> Cheers.
>>>
>>> Wahid
>>>
>>> On 15 Nov 2013, at 11:52, Matt Doidge <[log in to unmask]> wrote:
>>>
>>>> Thanks Sam,
>>>>> My equivalent file is in
>>>>> /dpm/gla.scotgrid.ac.uk/home/atlas/atlasdatadisk/user/HironoriIto/user.HironoriIto.xrootd.uki-scotgrid-glasgow/user.HironoriIto.xrootd.uki-scotgrid-glasgow-1M
>>>>>
>>>>> so a quick search and replace should get yours?
>>>>>
>>>>
>>>> It did indeed. And through some chatting Sam managed to spot the problem, we were missing the line:
>>>> dpm.mmreqhost localhost
>>>>
>>>> from our federated atlas xrootd config. Manual copies work now, so hopefully tests will start passing soon.
>>>>
>>>> Thanks again Sam!
>>>>
>>>> Cheers,
>>>> Matt
>>>>
>>>>>
>>>>>
>>>>> On 15 November 2013 10:56, Matt Doidge <[log in to unmask]> wrote:
>>>>>> Hello,
>>>>>> Lancaster's been on the atlas FAX naughty step for a while now, and
>>>>>> Sheffield seems to have joined us - so I thought I'd send this to the
>>>>>> storage group in a new thread rather then just bother Wahid (who's already
>>>>>> been over a lot of this with me- sorry for repeating myself squire!).
>>>>>>
>>>>>> For reference the tests we're failing are here (scroll down for UK sites):
>>>>>> http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview#currentView=FAX+endpoints&highlight=false
>>>>>>
>>>>>> It looks like the test file is inaccessible rather then a straight up xrootd
>>>>>> problem, and indeed when I asked Robin to xrdcp the test file we duplicate
>>>>>> the FAX error.
>>>>>>
>>>>>> I have two theories, either the file itself is FUBAR or our xrootd path
>>>>>> magic is wrong.
>>>>>>
>>>>>> Trouble is I can't figure out what file in my dpm namespace the xrootd surl
>>>>>> corresponds to:
>>>>>>
>>>>>> root://fal-pygrid-30.lancs.ac.uk:1094//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.uki-northgrid-lancs-hep/user.HironoriIto.xrootd.uki-northgrid-lancs-hep-1M
>>>>>>
>>>>>> All I know is that it's not /dpm/lancs.ac.uk/home/atlas/dq2/... (or
>>>>>> atlas/atlas/dq2...). Which I thought it would be, seeing as in my xrootd
>>>>>> configs I have:
>>>>>>
>>>>>> dpm.namecheck /dpm/lancs.ac.uk/home/atlas
>>>>>> and my dpm.namelib explicitly lists all the rucio prefixes.
>>>>>>
>>>>>> (We don't have replacementprefix or defaultprefix set, but I don't think we
>>>>>> need them).
>>>>>>
>>>>>> The logs don't seem to provide much enlightenment (to my eyes), and could be
>>>>>> full of red herrings. From the fedredir_atlas xrootd.log:
>>>>>> 131115 10:20:20 0x50c45700 XRD-LFC No such file or directory
>>>>>> /grid/atlas/users/pathena/user/HironoriIto/user.HironoriIto.xrootd.uki-northgrid-lancs-hep/user.HironoriIto.xrootd.uki-northgrid-lancs-hep-1M
>>>>>> 131115 10:20:20 0x50c45700 XRD-LFC: no valid replica for
>>>>>> /atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.uki-northgrid-lancs-hep/user.HironoriIto.xrootd.uki-northgrid-lancs-hep-1M
>>>>>>
>>>>>> Could someone who payed attention in school please remind me the xrootd ->
>>>>>> dpns mapping so I can check the file, or any other tips that they might
>>>>>> have?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Matt
>>>>
>>>
>>>
>>
>
>
|