We just know it happens and not only it is not the first time but it
happens to other sites as well, enough for a script to be produced. It
also happens the opposite, i.e. the file is actually on disk but the
entry is not in pnfs.
However it is not surprising considering that copying something to and
from dcache involves database interactions. It probably depends when the
roll back of the database changes gets done if something goes wrong.
cheers
alessandra
cheers
alessandra
Graeme Stewart wrote:
> At the moment the UK has a 4 day timeout on production, so after 4
> days the subscription will be cancelled and these jobs rerun.
>
> Any idea why the file went missing? The only way that the file copy
> back to RAL can be triggered is if the job ran successfully and lcg-cr
> was also successful.
>
> Cheers
>
> Graeme
>
> On Wed, Jun 18, 2008 at 9:13 AM, brian davies <[log in to unmask]> wrote:
>
>> Adding in atlas-uk this time.
>> Also i have done a lcg-lr on this file. only copy was at MAN so is
>> gone ( thoug I have only checked one LFC, but since this kind of
>> smells like MAN created the file and is uploading to RAL then it is
>> not surprising thsi was the only copy. Interested to see if panda does
>> stop trying to copy this file eventually (assuming it si panda
>> controlling it)
>> the guid and alias are as following:
>> guid:B2B15287-5E39-DD11-96C3-001422096940
>> lfn:/grid/atlas/dq2/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>
>>
>> 2008/6/18 brian davies <[log in to unmask]>:
>>
>>> Ok, so I think I understand this. originally you did a srmls and found
>>> the file ( since it was in pnfs) but it did not exist on disk. so you
>>> have now removed the pnfs entry ( hence why the srmls now fails.) how
>>> did you determine the file was not physically in a pool?
>>> Are there any more files in this state?
>>>
>>>
>>> So ATLAS shoulod now deal with this lost file.
>>> So ATLAS, the file
>>> srm://dcache01.tier2.hep.manchester.ac.uk:8443/pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>> Has been lost form the manchester dCache. Please clean your
>>> LFC/DQ2/DDM accoridngly.
>>> Regards
>>> Brian
>>> 2008/6/17 Sergey <[log in to unmask]>:
>>>
>>>> Brian,
>>>>
>>>> The file, appears, does not exist on the pool, although it was
>>>> existing in our pnfs system. This happens some times (e.g. see
>>>> http://www.gridpp.ac.uk/wiki/DCache_Administration_Scripts " Finding
>>>> list of PNFSids with no corresponding file") so, we have had to delete
>>>> this file name from system completely, not to be confused.
>>>>
>>>> sergey@niels005:~$/opt/d-cache/srm/bin/srmls -l
>>>> srm://dcache01.tier2.hep.manchester.ac.uk:8443/pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>>> WARNING: SRM_PATH is defined, which might cause a wrong version of srm
>>>> client to be executed
>>>> WARNING: SRM_PATH=/opt/d-cache/srm
>>>> Response from call to srmls:
>>>> Return status:
>>>> - Status code: SRM_FAILURE
>>>> - Explanation: path does not exist for one or more files specified,
>>>> check individual statuses
>>>> File/directory 0
>>>> /pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>>> does not exist.
>>>>
>>>> The only way out is to restore it from the tape.
>>>> At least there is no SURL now but you still need to unregister it firstly.
>>>> Sorry for that
>>>>
>>>> Regards
>>>> Sergey
>>>>
>>>> 2008/6/17 brian davies <[log in to unmask]>:
>>>>
>>>>> Have you tried a head node restart. perhaps it look slike a java issue maybe?
>>>>> Brian
>>>>>
>>>>> 2008/6/17 Sergey <[log in to unmask]>:
>>>>>
>>>>>> 2008/6/17 brian davies <[log in to unmask]>:
>>>>>>
>>>>>>> currently getting errorrs between manchester and RAL.
>>>>>>>
>>>>>>> Does manchester have a srmv2.2 working?
>>>>>>> Brian
>>>>>>>
>>>>>> Yes, the file exists and srm2.2 does work at Manchester as you can see
>>>>>> from following log:
>>>>>>
>>>>>> sergey@niels005:~$/opt/d-cache/srm/bin/srmls -l -debug=true
>>>>>> srm://dcache01.tier2.hep.manchester.ac.uk:8443/pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>>>>> WARNING: SRM_PATH is defined, which might cause a wrong version of srm
>>>>>> client to be executed
>>>>>> WARNING: SRM_PATH=/opt/d-cache/srm
>>>>>> Storage Resource Manager (SRM) CP Client version 2.0
>>>>>> Copyright (c) 2002-2006 Fermi National Accelerator Laboratory
>>>>>>
>>>>>> SRM Configuration:
>>>>>> debug=true
>>>>>> gsissl=true
>>>>>> help=false
>>>>>> pushmode=false
>>>>>> userproxy=true
>>>>>> buffer_size=131072
>>>>>> tcp_buffer_size=0
>>>>>> streams_num=10
>>>>>> config_file=config.xml
>>>>>> glue_mapfile=conf/SRMServerV1.map
>>>>>> webservice_path=srm/managerv2
>>>>>> webservice_protocol=https
>>>>>> gsiftpclinet=globus-url-copy
>>>>>> protocols_list=http,gsiftp
>>>>>> save_config_file=null
>>>>>> srmcphome=..
>>>>>> urlcopy=sbin/urlcopy.sh
>>>>>> x509_user_cert=/home/timur/k5-ca-proxy.pem
>>>>>> x509_user_key=/home/timur/k5-ca-proxy.pem
>>>>>> x509_user_proxy=/tmp/x509up_u508
>>>>>> x509_user_trusted_certificates=/etc/grid-security/certificates
>>>>>> globus_tcp_port_range=null
>>>>>> gss_expected_name=null
>>>>>> storagetype=permanent
>>>>>> retry_num=20
>>>>>> retry_timeout=10000
>>>>>> wsdl_url=null
>>>>>> use_urlcopy_script=false
>>>>>> connect_to_wsdl=false
>>>>>> delegate=true
>>>>>> full_delegation=true
>>>>>> server_mode=passive
>>>>>> srm_protocol_version=2
>>>>>> request_lifetime=86400
>>>>>> access latency=null
>>>>>> overwrite mode=null
>>>>>> priority=0
>>>>>> action is ls
>>>>>> recursion depth=1
>>>>>> offset=0
>>>>>> count=0
>>>>>> is long listing mode=true
>>>>>> surl[0]=srm://dcache01.tier2.hep.manchester.ac.uk:8443/pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>>>>> Tue Jun 17 12:35:35 BST 2008: In SRMClient ExpectedName: host
>>>>>> Tue Jun 17 12:35:35 BST 2008: SRMClient(https,srm/managerv2,true)
>>>>>> SRMClientV2 : user credentials are:
>>>>>> /C=UK/O=eScience/OU=Manchester/L=HEP/CN=sergey dolgobrodov
>>>>>> SRMClientV2 : WEBSERVICE_PATH srm/managerv2
>>>>>> SRMClientV2 : connecting to srm at
>>>>>> httpg://dcache01.tier2.hep.manchester.ac.uk:8443/srm/managerv2
>>>>>> SRMClientV2 : srmLs, contacting service
>>>>>> httpg://dcache01.tier2.hep.manchester.ac.uk:8443/srm/managerv2
>>>>>> 694682 /pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>>>>> storage type:PERMANENT
>>>>>> retentionpolicyinfo : null
>>>>>> locality:NEARLINE
>>>>>> locality: null
>>>>>> UserPermission: uid=15001 PermissionsRW
>>>>>> GroupPermission: gid=1002 PermissionsR
>>>>>> WorldPermission: R
>>>>>> created at:2008/06/13 16:39:47
>>>>>> modified at:2008/06/13 16:39:47
>>>>>> - Assigned lifetime (in seconds): -1
>>>>>> - Lifetime left (in seconds): -1
>>>>>> - Original SURL:
>>>>>> /pnfs/tier2.hep.manchester.ac.uk/data/atlas/valid2/HITS/valid2.008801.Hijing_PbPb_5p5TeV_MinBias.simul.HITS.e113_s417_tid022721/HITS.022721._46413.pool.root.1
>>>>>> - Status: null
>>>>>> - Type: FILE
>>>>>>
>>>>>> However I also failed to copy this file with the same error, so I
>>>>>> think it is A particular VO and links related issue. Trying to figure
>>>>>> out what is going.
>>>>>>
>>>>>> Sergey
>>>>>>
>>>>>>
>
>
>
>
--
Well you'll still need a tray
|