On Thu, 20 Oct 2005, Piotr Siwczak wrote:
> Hi,
>
> Thank you for your interest in our problems.
>
> There's still one thing to point out: we are using UI from LCG 2.4
Which explains the problem, given the bad entries just discovered in the
information system! See the message I just sent to the list.
> (there's no 2.6 UI release for ia64). My colleague recalls some posts on
> the ROLLOUT indicating that replica management from 2.4 no longer works
> with LCG26.
No, it _does_ work with 2.6, one just has to be careful with what is
published by a service.
> Maybe this is the issue.
>
> I've run lcg-cr through gdb. The output is given below (including
> backtrace of the stack):
>
> -----------
> gdb lcg-cr
> GNU gdb Red Hat Linux (6.3.0.0-1.62rh)
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "ia64-redhat-linux-gnu"...Using host
> libthread_db library "/lib/tls/libthread_db.so.1".
>
> (gdb) run -v --vo dteam -d se1.egee.man.poznan.pl file:/etc/group
> Starting program: /opt/lcg/bin/lcg-cr -v --vo dteam -d
> se1.egee.man.poznan.pl file:/etc/group
> [Thread debugging using libthread_db enabled]
> [New Thread 2305843009237893136 (LWP 31073)]
> Using grid catalog type: edg
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 2305843009237893136 (LWP 31073)]
> 0x20000000004e1130 in _int_malloc () from /lib/tls/libc.so.6.1
> (gdb) backtrace
> #0 0x20000000004e1130 in _int_malloc () from /lib/tls/libc.so.6.1
> #1 0x20000000004df300 in malloc () from /lib/tls/libc.so.6.1
> #2 0x20000000004eeb60 in strdup () from /lib/tls/libc.so.6.1
> #3 0x2000000000192e10 in get_rls_endpointsx
> (lrc_endpoint=0x20000000003d6270, rmc_endpoint=0x20000000003d6278,
> errbuf=0x0, errbufsz=0)
> at mds_ifce.c:178
> #4 0x200000000018f7a0 in lrc_init (soap=0x60000ffffffec500, errbuf=0x0,
> errbufsz=0) at lrc_ifce.c:36
> #5 0x2000000000190150 in lrc_guid_exists (guid=0x60000fffffff8ba0
> "506c7825-df0d-430d-8c46-7294c858af6e", errbuf=0x0, errbufsz=0)
> at lrc_ifce.c:125
> #6 0x20000000000c7b70 in guid_exists (guid=0x60000fffffff8ba0
> "506c7825-df0d-430d-8c46-7294c858af6e", errbuf=0x0, errbufsz=0)
> at gfal.c:1198
> #7 0x2000000000076f30 in lcg_crx (src_file=0x60000fffffffb582
> "file:/etc/group", dest_file=0x60000fffffffb56b "se1.egee.man.poznan.pl",
> guid=0x60000fffffff8ba0 "506c7825-df0d-430d-8c46-7294c858af6e",
> lfn=0x0, vo=0x60000fffffffb562 "dteam", relative_path=0x0,
> nbstreams=1, conf_file=0x0, insecure=0, verbose=1,
> actual_guid=0x60000fffffff9220 "", errbuf=0x0, errbufsz=0) at lcg_cr.c:141
> #8 0x2000000000076340 in lcg_cr (src_file=0x60000fffffffb582
> "file:/etc/group", dest_file=0x60000fffffffb56b "se1.egee.man.poznan.pl",
> guid=0x0, lfn=0x0, vo=0x60000fffffffb562 "dteam", relative_path=0x0,
> nbstreams=1, conf_file=0x0, insecure=0, verbose=1,
> actual_guid=0x60000fffffff9220 "") at lcg_cr.c:28
> #9 0x4000000000001ba0 in main (argc=7, argv=0x60000fffffff9b08) at
> lcg-cr.c:138
>
>
> Best regards,
> Piotr
>
> --
> Piotr Siwczak <[log in to unmask]>
> System Administrator
>
> Poznan Supercomputing and Networking Center
> Supercomputing Department
>
> (www.eu-egee.org <[log in to unmask]>)
> --
>
> On Thu, 20 Oct 2005 [log in to unmask] wrote:
>
> > On Wed, 19 Oct 2005, Piotr Siwczak wrote:
> >
> >> Hi,
> >> We run an LCG26 site based on Itanium2 machines.
> >> Wa also run a framework for submitting Site Functional Tests on demand on
> >> our CE (the CE is equipped with user interface too).
> >> The site was running fine up until today's morning when, to our surprise
> >> , commands responsible for replica management things began to crash.
> >
> > Obvious questions: did you change anything, did apt/yum/... upgrade things?
> >
> >> When lcg-cr or lcg-del commands are invoked from SFT framework as well as
> >> by hand, they quit with a segfault:
> >>
> >> lcg-cr --verbose --vo dteam -d se1.egee.man.poznan.pl hostname.jdl
> >
> > That command cannot work; I suppose you meant this instead:
> >
> > lcg-cr --verbose --vo dteam -d se1.egee.man.poznan.pl file:`pwd`/hostname.jdl
> >
> >> Using grid catalog type: edg
> >> Segmentation fault
> >
> > Run the command under gdb and give us the stack trace:
> >
> > $ gdb lcg-cr
> > .....
> > (gdb) run -v --vo dteam -d se1.egee.man.poznan.pl file:/etc/group
> > .....
> > Segmentation fault
> > (gdb) where
> > .....
> > .....
> > .....
> >
>
|