Hi Alessandra
169 jobs have finished at Sheffield, 35 failed (17%) . It's not Graeme's
golden target of 95%. So I can't believe this is software version
problem. Moreover, we hitted this golden target in atlas production last
week.
This error for analysis jobs is the result of network load (at least I
think so). I saw these
errors during STEP09. If I'm trying to repeat all these failed operation,
I'll succeed. Most of errors gone (or at least number of error decrease if
the number of analysis jobs decrease.
I have also seen such types of errors even for prooduction jobs during
STEP09. When STEP09 was over the errors gone.
For LHCB jobs we haven't noticed network load and there were only 70
jobs.
Cheers
Elena
On Wed, 24 Jun 2009, Alessandra Forti wrote:
> Elena,
>
> you have problems also with atlas. From an email Graeme just sent:
>
> Sheffield: LFC lookup problems and stage-in/out problems (network issues?):
>
> http://panda.cern.ch:25980/server/pandamon/query?job=1012762913
> http://panda.cern.ch:25980/server/pandamon/query?job=1012750235
> http://panda.cern.ch:25980/server/pandamon/query?job=1012742714
>
>
> and there is "Connection reset by peer again".
>
> cheers
> alessandra
>
> Elena Korolkova wrote:
>> Hi Alessandra
>>
>> we had
>>
>> GFAL-client-1.11.4-1.slc4.i386
>> lcg_util-1.7.2-1.slc4.i386.
>>
>> On Monday I decided that it could caaused the problem and updated. Now we
>> have
>>
>> GFAL-client-1.11.6-2.slc4.i386
>> lcg_util-1.7.4-1.slc4.i386
>>
>> Thank you for your help
>> Elena
>>
>>
>>
>> On Wed, 24 Jun 2009, Alessandra Forti wrote:
>>
>>> Hi Elena,
>>>
>>> Manchester has these versions of GFAL and lcg_util respectively.
>>>
>>> GFAL-client-1.10.11-1.slc4
>>> lcg_util-1.6.11-1.slc4
>>>
>>> The error says "Connection reset by peer" which often indicates an
>>> authorization problem. We should involve RAL and see if they have anything
>>> in their log files.
>>>
>>> cheers
>>> alessandra
>>>
>>>
>>> Elena Korolkova wrote:
>>>> Hi Alessandra
>>>>
>>>> thanks for reply. We hape the latest version of glite on WNs. We still
>>>> need to upgrade DPM but this is not related to LHCb stuff as they do not
>>>> use out storage.
>>>>
>>>> I don't know what is wrong.
>>>> If you could tell me what do you think could be obsolete please let me
>>>> know and also what vesion are you using at your sites which are fine for
>>>> LHCb.
>>>>
>>>> I can't easily check how many LHCb jobs were successful, neither I can
>>>> check jobs outputs as I do thsi for atlas. This makes the task of
>>>> finding the error more complicated.
>>>>
>>>> Cheers
>>>> Elena
>>>>
>>>> On Tue, 23 Jun 2009, Alessandra Forti wrote:
>>>>
>>>>> Hi Elena,
>>>>>
>>>>> Raja reported the problem also at the dteam meeting and forwarded me an
>>>>> email Vladimir wrote you. I still have to look into it. From the top of
>>>>> my head, since you are the only site that fails, it might be some
>>>>> software version problem at the site.
>>>>>
>>>>> cheers
>>>>> alessandra
>>>>>
>>>>> Elena Korolkova wrote:
>>>>>>
>>>>>> Hello
>>>>>>
>>>>>> Sheffield was blacklisted by lhcb for production. I saw you all guys
>>>>>> are green for lhcb in new grid map.
>>>>>>
>>>>>> I attached the plot which was sent to us by lhcb guy. The problem
>>>>>> occurs at the final stage when the job output should be copied from the
>>>>>> worker node to RAL.
>>>>>>
>>>>>> As we are not failing LHCb SAM tests and small part of jobs finished
>>>>>> successfully, I don't think it's site configuration problem.
>>>>>>
>>>>>> The error message from pilot:
>>>>>>
>>>>>> 2009-06-20 11:11:42 UTC dirac-jobexec.py INFO: SRM2Storage.__putFile:
>>>>>> Executing transfer of
>>>>>> file:/home/prdlhb90/globus-tmp.wn074.487.0/https_3a_2f_2fwms203.cern.ch_
>>>>>> 3a9000_2fKN3KVWH941S4LscI-crZ-g/2858510/00004837_00279010_3.dst to
>>>>>> srm://srm-lhcb.gridpp.rl.ac.uk:8443/srm/managerv2?SFN=/castor/ads.rl.ac.
>>>>>> uk/prod/lhcb/MC/MC09/DST/00004837/0027/00004837_00279010_3.dst
>>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR: SRM2Storage.__putFile:
>>>>>> Failed to put file to storage. globus_xio: System error in writev:
>>>>>> Connection reset by peer
>>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR: globus_xio: A system
>>>>>> call failed: Connection reset by peer
>>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py ERROR:
>>>>>> ReplicaManager.putAndRegister: Failed to put file to Storage Element.
>>>>>> /home/prdlhb90/globus-tmp.wn074.487.0/https_3a_2f_2fwms203.cern.ch_3a900
>>>>>> 0_2fKN3KVWH941S4LscI-crZ-g/2858510/00004837_00279010_3.dst:
>>>>>> SRM2Storage.__putFile: Failed to put file to storage.
>>>>>> 2009-06-20 11:12:02 UTC dirac-jobexec.py/UploadOutputData VERB:
>>>>>> {'Message': 'ReplicaManager.putAndRegister: Failed to put file to
>>>>>> Storage Element. SRM2Storage.__putFile: Failed to put file to
>>>>>> storage.', 'OK': False}
>>>>>>
>>>>>> Our network is not overloaded.
>>>>>>
>>>>>> Any ideas what can be wrong are greatly appreciated.
>>>>>>
>>>>>> Cheers
>>>>>> Elena
>>>>>>
>>>>>> ____________________________________________________________________________
>>>>>> Dr Elena Korolkova
>>>>>> Email: [log in to unmask]
>>>>>> Tel.: +44 (0)114 2223553
>>>>>> Fax: +44 (0)114 2223555
>>>>>> Department of Physics and Astronomy
>>>>>> University of Sheffield
>>>>>> Sheffield, S3 7RH, United Kingdom
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>
>>>>
>>>> ____________________________________________________________________________
>>>> Dr Elena Korolkova
>>>> Email: [log in to unmask]
>>>> Tel.: +44 (0)114 2223553
>>>> Fax: +44 (0)114 2223555
>>>> Department of Physics and Astronomy
>>>> University of Sheffield
>>>> Sheffield, S3 7RH, United Kingdom
>>>
>>
>> ____________________________________________________________________________
>> Dr Elena Korolkova
>> Email: [log in to unmask]
>> Tel.: +44 (0)114 2223553
>> Fax: +44 (0)114 2223555
>> Department of Physics and Astronomy
>> University of Sheffield
>> Sheffield, S3 7RH, United Kingdom
>
____________________________________________________________________________
Dr Elena Korolkova
Email: [log in to unmask]
Tel.: +44 (0)114 2223553
Fax: +44 (0)114 2223555
Department of Physics and Astronomy
University of Sheffield
Sheffield, S3 7RH, United Kingdom
|