Print

Print


On Mon, 10 Jan 2011, Peter W. Draper wrote:

> On Mon, 10 Jan 2011, David Berry wrote:
> 
> > On 7 January 2011 18:32, Peter W. Draper <[log in to unmask]> wrote:
> > > On Fri, 7 Jan 2011, Mark Taylor wrote:
> > > 
> > > > On Fri, 7 Jan 2011, Peter W. Draper wrote:
> > > > 
> > > > > > Having done all that ... running the JNIAST unit tests is now giving
> > > > > > me occasional core dumps :-(.  This is probably a threading issue -
> > > > > > the SEGVs are in uk.ac.starlink.ast.ThreadTest.  I've attached an
> > > > > > example dump file. Difficult to reproduce, currently happening
> > > > > > something like one time in ten, but in the way of these things it
> > > > > > seems to go away if you just run the test ten times.  I don't
> > > > > > *think* this problem is present in the build before these changes.
> > > > > >  It looks to me unlikely that the changes I've just made to JNIAST
> > > > > > would cause this (though I can't say it's impossible), which
> > > > > > suggests it's to do with how it's built. The builds I've done have
> > > > > > been against hawaiki (AST 5.3-1) or nanahope (AST 5.2-0) - I don't
> > > > > > have a 5.1-0 to hand. So, it could be version issues.  I suspect
> > > > > > however that it might go away if it's built against an AST built
> > > > > > without threading.  Although I believe/believed that JNIAST ought to
> > > > > > work with a multithreaded AST, it doesn't actually benefit much from
> > > > > > multithreaded AST (see comments in jniast.h for the sorry tale), so
> > > > > > this wouldn't have much of a downside (and the previous build might
> > > > > > well have been against a threadless AST, though I'm not sure if it
> > > > > > was or not).  If we can't resolve this we should put the changes
> > > > > > back on a branch or otherwise roll them back - it is not safe to
> > > > > > leave them without a native library rebuild, since the java and C
> > > > > > sides of the JNI code do not quite match when using the old native
> > > > > > libs.
> > > > > 
> > > > > Hi Mark & David,
> > > > > 
> > > > > OK, I've rebuilt JNIAST against the latest AST (5.4) and this problem
> > > > > is still there, and it's repeatable as in I get to the same line of
> > > > > code each time, and it seems likely to be the same one that Mark's
> > > > > dump shows:
> > > > > 
> > > > >  1225: MAKE_RESAMPLEX(I,int,jint,Int,I)
> > > > > 
> > > > > in Mapping.c.
> > > > > 
> > > > > Must be worth one of you looking at that macro for any shared
> > > > > resources that are not being locked (there are some structs malloc
> > > > > outside of any locks I can see, or David worrying about astResample,
> > > > > but I imagine that is heavily used?).
> > > > > 
> > > > > I'd prefer not to require a non-threaded AST library as that will mean
> > > > > we cannot build against a standard tree (that is bound to come back as
> > > > > a problem).
> > > > 
> > > > I did a bit more investigation, and it turns out that contrary to my
> > > > assessment above, the same issue can show up in the existing JNIAST, so
> > > > this has been lurking for a while, i.e. before the FitsChan-related
> > > > fixes suggested by David.  I'm not sure why I haven't seen the failure
> > > > before, but likely it's just that the last time I was doing JNIAST work
> > > > I had a different machine and it just never hit the relevant race
> > > > condition.  On the bright side, as far as I know nobody has come across
> > > > this issue in real life - probably nobody is doing massively
> > > > multithreaded JNIAST.
> > > > 
> > > > So, the new build is (most likely) no worse than the one we had before.
> > > > It's quite possible that a non-threaded AST wouldn't even help.
> > > 
> > > Sorry should have at least tested that idea, but it seems getting a
> > > non-threaded AST isn't possible:
> > > 
> > >  ./configure --without-pthreads
> > >  make
> > >    .....
> > >  channel.c: In function 'ReadDouble':
> > >  channel.c:2796: error: 'AST__BAD' undeclared (first use in this function)
> > >  channel.c:2796: error: (Each undeclared identifier is reported only once
> > >  channel.c:2796: error: for each function it appears in.)
> > >  channel.c: In function 'WriteDouble':
> > >  channel.c:3989: error: 'AST__BAD' undeclared (first use in this function)
> > >  make[1]: *** [libast_la-channel.lo] Error 1
> > >  make[1]: Leaving directory
> > >  `/loc/pwda/pdraper/starlink_git/starlink/libraries/ast'
> > >  make: *** [all] Error 2
> > 
> > A few missing includes - should be OK now.
> 
> And the result is that this error is also present with an unthreaded AST build
> as well. So this problem is probably not related to AST threads.

Thanks for checking that out, and sorry for the red herring.

> Given that I'd suggest that we just build against threaded AST and Mark takes
> a note to look at this sometime.

OK, but note that sometime may not be soon.

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
[log in to unmask] +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/