Hi Elena,
Manchester has these versions of GFAL and lcg_util respectively.
GFAL-client-1.10.11-1.slc4
lcg_util-1.6.11-1.slc4
The error says "Connection reset by peer" which often indicates an
authorization problem. We should involve RAL and see if they have
anything in their log files.
cheers
alessandra
Elena Korolkova wrote:
> Hi Alessandra
>
> thanks for reply. We hape the latest version of glite on WNs. We still
> need to upgrade DPM but this is not related to LHCb stuff as they do
> not use out storage.
>
> I don't know what is wrong.
> If you could tell me what do you think could be obsolete please let me
> know and also what vesion are you using at your sites which are fine
> for LHCb.
>
> I can't easily check how many LHCb jobs were successful, neither I can
> check jobs outputs as I do thsi for atlas. This makes the task of
> finding the error more complicated.
>
> Cheers
> Elena
>
> On Tue, 23 Jun 2009, Alessandra Forti wrote:
>
>> Hi Elena,
>>
>> Raja reported the problem also at the dteam meeting and forwarded me
>> an email Vladimir wrote you. I still have to look into it. From the
>> top of my head, since you are the only site that fails, it might be
>> some software version problem at the site.
>>
>> cheers
>> alessandra
>>
>> Elena Korolkova wrote:
>>>
>>> Hello
>>>
>>> Sheffield was blacklisted by lhcb for production. I saw you all guys
>>> are green for lhcb in new grid map.
>>>
>>> I attached the plot which was sent to us by lhcb guy. The problem
>>> occurs at the final stage when the job output should be copied from
>>> the worker node to RAL.
>>>
>>> As we are not failing LHCb SAM tests and small part of jobs finished
>>> successfully, I don't think it's site configuration problem.
>>>
>>> The error message from pilot:
>>>
>>> 2009-06-20 11:11:42 UTC dirac-jobexec.py INFO: SRM2Storage.__putFile:
>>> Executing transfer of
>>> file:/home/prdlhb90/globus-tmp.wn074.487.0/https_3a_2f_2fwms203.cern.ch_
>>>
>>> 3a9000_2fKN3KVWH941S4LscI-crZ-g/2858510/00004837_00279010_3.dst to
>>> srm://srm-lhcb.gridpp.rl.ac.uk:8443/srm/managerv2?SFN=/castor/ads.rl.ac.
>>>
>>> uk/prod/lhcb/MC/MC09/DST/00004837/0027/00004837_00279010_3.dst
>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR: SRM2Storage.__putFile:
>>> Failed to put file to storage. globus_xio: System error in writev:
>>> Connection reset by peer
>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR: globus_xio: A system
>>> call failed: Connection reset by peer
>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR:
>>> ReplicaManager.putAndRegister: Failed to put file to Storage Element.
>>> /home/prdlhb90/globus-tmp.wn074.487.0/https_3a_2f_2fwms203.cern.ch_3a900
>>>
>>> 0_2fKN3KVWH941S4LscI-crZ-g/2858510/00004837_00279010_3.dst:
>>> SRM2Storage.__putFile: Failed to put file to storage.
>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py/UploadOutputData VERB:
>>> {'Message': 'ReplicaManager.putAndRegister: Failed to put file to
>>> Storage Element. SRM2Storage.__putFile: Failed to put file to
>>> storage.', 'OK': False}
>>>
>>> Our network is not overloaded.
>>>
>>> Any ideas what can be wrong are greatly appreciated.
>>>
>>> Cheers
>>> Elena
>>>
>>> ____________________________________________________________________________
>>> Dr Elena Korolkova
>>> Email: [log in to unmask]
>>> Tel.: +44 (0)114 2223553
>>> Fax: +44 (0)114 2223555
>>> Department of Physics and Astronomy
>>> University of Sheffield
>>> Sheffield, S3 7RH, United Kingdom
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>
>
> ____________________________________________________________________________
>
> Dr Elena Korolkova
> Email: [log in to unmask]
> Tel.: +44 (0)114 2223553
> Fax: +44 (0)114 2223555
> Department of Physics and Astronomy
> University of Sheffield
> Sheffield, S3 7RH, United Kingdom
|