Elena,
you have problems also with atlas. From an email Graeme just sent:
Sheffield: LFC lookup problems and stage-in/out problems (network issues?):
http://panda.cern.ch:25980/server/pandamon/query?job=1012762913
http://panda.cern.ch:25980/server/pandamon/query?job=1012750235
http://panda.cern.ch:25980/server/pandamon/query?job=1012742714
and there is "Connection reset by peer again".
cheers
alessandra
Elena Korolkova wrote:
> Hi Alessandra
>
> we had
>
> GFAL-client-1.11.4-1.slc4.i386
> lcg_util-1.7.2-1.slc4.i386.
>
> On Monday I decided that it could caaused the problem and updated.
> Now we have
>
> GFAL-client-1.11.6-2.slc4.i386
> lcg_util-1.7.4-1.slc4.i386
>
> Thank you for your help
> Elena
>
>
>
> On Wed, 24 Jun 2009, Alessandra Forti wrote:
>
>> Hi Elena,
>>
>> Manchester has these versions of GFAL and lcg_util respectively.
>>
>> GFAL-client-1.10.11-1.slc4
>> lcg_util-1.6.11-1.slc4
>>
>> The error says "Connection reset by peer" which often indicates an
>> authorization problem. We should involve RAL and see if they have
>> anything in their log files.
>>
>> cheers
>> alessandra
>>
>>
>> Elena Korolkova wrote:
>>> Hi Alessandra
>>>
>>> thanks for reply. We hape the latest version of glite on WNs. We
>>> still need to upgrade DPM but this is not related to LHCb stuff as
>>> they do not use out storage.
>>>
>>> I don't know what is wrong.
>>> If you could tell me what do you think could be obsolete please let
>>> me know and also what vesion are you using at your sites which are
>>> fine for LHCb.
>>>
>>> I can't easily check how many LHCb jobs were successful, neither I
>>> can check jobs outputs as I do thsi for atlas. This makes the task
>>> of finding the error more complicated.
>>>
>>> Cheers
>>> Elena
>>>
>>> On Tue, 23 Jun 2009, Alessandra Forti wrote:
>>>
>>>> Hi Elena,
>>>>
>>>> Raja reported the problem also at the dteam meeting and forwarded
>>>> me an email Vladimir wrote you. I still have to look into it. From
>>>> the top of my head, since you are the only site that fails, it
>>>> might be some software version problem at the site.
>>>>
>>>> cheers
>>>> alessandra
>>>>
>>>> Elena Korolkova wrote:
>>>>>
>>>>> Hello
>>>>>
>>>>> Sheffield was blacklisted by lhcb for production. I saw you all
>>>>> guys are green for lhcb in new grid map.
>>>>>
>>>>> I attached the plot which was sent to us by lhcb guy. The problem
>>>>> occurs at the final stage when the job output should be copied
>>>>> from the worker node to RAL.
>>>>>
>>>>> As we are not failing LHCb SAM tests and small part of jobs
>>>>> finished successfully, I don't think it's site configuration problem.
>>>>>
>>>>> The error message from pilot:
>>>>>
>>>>> 2009-06-20 11:11:42 UTC dirac-jobexec.py INFO:
>>>>> SRM2Storage.__putFile:
>>>>> Executing transfer of
>>>>> file:/home/prdlhb90/globus-tmp.wn074.487.0/https_3a_2f_2fwms203.cern.ch_
>>>>> 3a9000_2fKN3KVWH941S4LscI-crZ-g/2858510/00004837_00279010_3.dst to
>>>>> srm://srm-lhcb.gridpp.rl.ac.uk:8443/srm/managerv2?SFN=/castor/ads.rl.ac.
>>>>> uk/prod/lhcb/MC/MC09/DST/00004837/0027/00004837_00279010_3.dst
>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR:
>>>>> SRM2Storage.__putFile:
>>>>> Failed to put file to storage. globus_xio: System error in writev:
>>>>> Connection reset by peer
>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR: globus_xio: A system
>>>>> call failed: Connection reset by peer
>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR:
>>>>> ReplicaManager.putAndRegister: Failed to put file to Storage Element.
>>>>> /home/prdlhb90/globus-tmp.wn074.487.0/https_3a_2f_2fwms203.cern.ch_3a900
>>>>> 0_2fKN3KVWH941S4LscI-crZ-g/2858510/00004837_00279010_3.dst:
>>>>> SRM2Storage.__putFile: Failed to put file to storage.
>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py/UploadOutputData VERB:
>>>>> {'Message': 'ReplicaManager.putAndRegister: Failed to put file to
>>>>> Storage Element. SRM2Storage.__putFile: Failed to put file to
>>>>> storage.', 'OK': False}
>>>>>
>>>>> Our network is not overloaded.
>>>>>
>>>>> Any ideas what can be wrong are greatly appreciated.
>>>>>
>>>>> Cheers
>>>>> Elena
>>>>>
>>>>> ____________________________________________________________________________
>>>>> Dr Elena Korolkova
>>>>> Email: [log in to unmask]
>>>>> Tel.: +44 (0)114 2223553
>>>>> Fax: +44 (0)114 2223555
>>>>> Department of Physics and Astronomy
>>>>> University of Sheffield
>>>>> Sheffield, S3 7RH, United Kingdom
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>
>>>
>>> ____________________________________________________________________________
>>> Dr Elena Korolkova
>>> Email: [log in to unmask]
>>> Tel.: +44 (0)114 2223553
>>> Fax: +44 (0)114 2223555
>>> Department of Physics and Astronomy
>>> University of Sheffield
>>> Sheffield, S3 7RH, United Kingdom
>>
>
> ____________________________________________________________________________
>
> Dr Elena Korolkova
> Email: [log in to unmask]
> Tel.: +44 (0)114 2223553
> Fax: +44 (0)114 2223555
> Department of Physics and Astronomy
> University of Sheffield
> Sheffield, S3 7RH, United Kingdom
|