Thank you for the hints. We managed to get the 2.6 version from OpenLab
developers and everything got back to normal.
Best regards,
Piotr
--
Piotr Siwczak <[log in to unmask]>
System Administrator
Poznan Supercomputing and Networking Center
Supercomputing Department
(www.eu-egee.org <[log in to unmask]>)
--
On Thu, 20 Oct 2005 [log in to unmask] wrote:
> On Thu, 20 Oct 2005, Piotr Siwczak wrote:
>
>> Hi,
>>
>> Thank you for your interest in our problems.
>>
>> There's still one thing to point out: we are using UI from LCG 2.4
>
> Which explains the problem, given the bad entries just discovered in the
> information system! See the message I just sent to the list.
>
>> (there's no 2.6 UI release for ia64). My colleague recalls some posts on
>> the ROLLOUT indicating that replica management from 2.4 no longer works
>> with LCG26.
>
> No, it _does_ work with 2.6, one just has to be careful with what is
> published by a service.
>
>> Maybe this is the issue.
>>
>> I've run lcg-cr through gdb. The output is given below (including
>> backtrace of the stack):
>>
>> -----------
>> gdb lcg-cr
>> GNU gdb Red Hat Linux (6.3.0.0-1.62rh)
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you
>> are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB. Type "show warranty" for
>> details.
>> This GDB was configured as "ia64-redhat-linux-gnu"...Using host
>> libthread_db library "/lib/tls/libthread_db.so.1".
>>
>> (gdb) run -v --vo dteam -d se1.egee.man.poznan.pl file:/etc/group
>> Starting program: /opt/lcg/bin/lcg-cr -v --vo dteam -d
>> se1.egee.man.poznan.pl file:/etc/group
>> [Thread debugging using libthread_db enabled]
>> [New Thread 2305843009237893136 (LWP 31073)]
>> Using grid catalog type: edg
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 2305843009237893136 (LWP 31073)]
>> 0x20000000004e1130 in _int_malloc () from /lib/tls/libc.so.6.1
>> (gdb) backtrace
>> #0 0x20000000004e1130 in _int_malloc () from /lib/tls/libc.so.6.1
>> #1 0x20000000004df300 in malloc () from /lib/tls/libc.so.6.1
>> #2 0x20000000004eeb60 in strdup () from /lib/tls/libc.so.6.1
>> #3 0x2000000000192e10 in get_rls_endpointsx
>> (lrc_endpoint=0x20000000003d6270, rmc_endpoint=0x20000000003d6278,
>> errbuf=0x0, errbufsz=0)
>> at mds_ifce.c:178
>> #4 0x200000000018f7a0 in lrc_init (soap=0x60000ffffffec500, errbuf=0x0,
>> errbufsz=0) at lrc_ifce.c:36
>> #5 0x2000000000190150 in lrc_guid_exists (guid=0x60000fffffff8ba0
>> "506c7825-df0d-430d-8c46-7294c858af6e", errbuf=0x0, errbufsz=0)
>> at lrc_ifce.c:125
>> #6 0x20000000000c7b70 in guid_exists (guid=0x60000fffffff8ba0
>> "506c7825-df0d-430d-8c46-7294c858af6e", errbuf=0x0, errbufsz=0)
>> at gfal.c:1198
>> #7 0x2000000000076f30 in lcg_crx (src_file=0x60000fffffffb582
>> "file:/etc/group", dest_file=0x60000fffffffb56b "se1.egee.man.poznan.pl",
>> guid=0x60000fffffff8ba0 "506c7825-df0d-430d-8c46-7294c858af6e",
>> lfn=0x0, vo=0x60000fffffffb562 "dteam", relative_path=0x0,
>> nbstreams=1, conf_file=0x0, insecure=0, verbose=1,
>> actual_guid=0x60000fffffff9220 "", errbuf=0x0, errbufsz=0) at lcg_cr.c:141
>> #8 0x2000000000076340 in lcg_cr (src_file=0x60000fffffffb582
>> "file:/etc/group", dest_file=0x60000fffffffb56b "se1.egee.man.poznan.pl",
>> guid=0x0, lfn=0x0, vo=0x60000fffffffb562 "dteam", relative_path=0x0,
>> nbstreams=1, conf_file=0x0, insecure=0, verbose=1,
>> actual_guid=0x60000fffffff9220 "") at lcg_cr.c:28
>> #9 0x4000000000001ba0 in main (argc=7, argv=0x60000fffffff9b08) at
>> lcg-cr.c:138
>>
>>
>> Best regards,
>> Piotr
>>
>> --
>> Piotr Siwczak <[log in to unmask]>
>> System Administrator
>>
>> Poznan Supercomputing and Networking Center
>> Supercomputing Department
>>
>> (www.eu-egee.org <[log in to unmask]>)
>> --
>>
>> On Thu, 20 Oct 2005 [log in to unmask] wrote:
>>
>>> On Wed, 19 Oct 2005, Piotr Siwczak wrote:
>>>
>>>> Hi,
>>>> We run an LCG26 site based on Itanium2 machines.
>>>> Wa also run a framework for submitting Site Functional Tests on demand on
>>>> our CE (the CE is equipped with user interface too).
>>>> The site was running fine up until today's morning when, to our surprise
>>>> , commands responsible for replica management things began to crash.
>>>
>>> Obvious questions: did you change anything, did apt/yum/... upgrade things?
>>>
>>>> When lcg-cr or lcg-del commands are invoked from SFT framework as well as
>>>> by hand, they quit with a segfault:
>>>>
>>>> lcg-cr --verbose --vo dteam -d se1.egee.man.poznan.pl hostname.jdl
>>>
>>> That command cannot work; I suppose you meant this instead:
>>>
>>> lcg-cr --verbose --vo dteam -d se1.egee.man.poznan.pl file:`pwd`/hostname.jdl
>>>
>>>> Using grid catalog type: edg
>>>> Segmentation fault
>>>
>>> Run the command under gdb and give us the stack trace:
>>>
>>> $ gdb lcg-cr
>>> .....
>>> (gdb) run -v --vo dteam -d se1.egee.man.poznan.pl file:/etc/group
>>> .....
>>> Segmentation fault
>>> (gdb) where
>>> .....
>>> .....
>>> .....
>>>
>>
>
|