On Mon, 10 Jan 2011, Peter W. Draper wrote: > On Mon, 10 Jan 2011, David Berry wrote: > > > On 7 January 2011 18:32, Peter W. Draper <[log in to unmask]> wrote: > > > On Fri, 7 Jan 2011, Mark Taylor wrote: > > > > > > > On Fri, 7 Jan 2011, Peter W. Draper wrote: > > > > > > > > > > Having done all that ... running the JNIAST unit tests is now giving > > > > > > me occasional core dumps :-(. This is probably a threading issue - > > > > > > the SEGVs are in uk.ac.starlink.ast.ThreadTest. I've attached an > > > > > > example dump file. Difficult to reproduce, currently happening > > > > > > something like one time in ten, but in the way of these things it > > > > > > seems to go away if you just run the test ten times. I don't > > > > > > *think* this problem is present in the build before these changes. > > > > > > It looks to me unlikely that the changes I've just made to JNIAST > > > > > > would cause this (though I can't say it's impossible), which > > > > > > suggests it's to do with how it's built. The builds I've done have > > > > > > been against hawaiki (AST 5.3-1) or nanahope (AST 5.2-0) - I don't > > > > > > have a 5.1-0 to hand. So, it could be version issues. I suspect > > > > > > however that it might go away if it's built against an AST built > > > > > > without threading. Although I believe/believed that JNIAST ought to > > > > > > work with a multithreaded AST, it doesn't actually benefit much from > > > > > > multithreaded AST (see comments in jniast.h for the sorry tale), so > > > > > > this wouldn't have much of a downside (and the previous build might > > > > > > well have been against a threadless AST, though I'm not sure if it > > > > > > was or not). If we can't resolve this we should put the changes > > > > > > back on a branch or otherwise roll them back - it is not safe to > > > > > > leave them without a native library rebuild, since the java and C > > > > > > sides of the JNI code do not quite match when using the old native > > > > > > libs. > > > > > > > > > > Hi Mark & David, > > > > > > > > > > OK, I've rebuilt JNIAST against the latest AST (5.4) and this problem > > > > > is still there, and it's repeatable as in I get to the same line of > > > > > code each time, and it seems likely to be the same one that Mark's > > > > > dump shows: > > > > > > > > > > 1225: MAKE_RESAMPLEX(I,int,jint,Int,I) > > > > > > > > > > in Mapping.c. > > > > > > > > > > Must be worth one of you looking at that macro for any shared > > > > > resources that are not being locked (there are some structs malloc > > > > > outside of any locks I can see, or David worrying about astResample, > > > > > but I imagine that is heavily used?). > > > > > > > > > > I'd prefer not to require a non-threaded AST library as that will mean > > > > > we cannot build against a standard tree (that is bound to come back as > > > > > a problem). > > > > > > > > I did a bit more investigation, and it turns out that contrary to my > > > > assessment above, the same issue can show up in the existing JNIAST, so > > > > this has been lurking for a while, i.e. before the FitsChan-related > > > > fixes suggested by David. I'm not sure why I haven't seen the failure > > > > before, but likely it's just that the last time I was doing JNIAST work > > > > I had a different machine and it just never hit the relevant race > > > > condition. On the bright side, as far as I know nobody has come across > > > > this issue in real life - probably nobody is doing massively > > > > multithreaded JNIAST. > > > > > > > > So, the new build is (most likely) no worse than the one we had before. > > > > It's quite possible that a non-threaded AST wouldn't even help. > > > > > > Sorry should have at least tested that idea, but it seems getting a > > > non-threaded AST isn't possible: > > > > > > ./configure --without-pthreads > > > make > > > ..... > > > channel.c: In function 'ReadDouble': > > > channel.c:2796: error: 'AST__BAD' undeclared (first use in this function) > > > channel.c:2796: error: (Each undeclared identifier is reported only once > > > channel.c:2796: error: for each function it appears in.) > > > channel.c: In function 'WriteDouble': > > > channel.c:3989: error: 'AST__BAD' undeclared (first use in this function) > > > make[1]: *** [libast_la-channel.lo] Error 1 > > > make[1]: Leaving directory > > > `/loc/pwda/pdraper/starlink_git/starlink/libraries/ast' > > > make: *** [all] Error 2 > > > > A few missing includes - should be OK now. > > And the result is that this error is also present with an unthreaded AST build > as well. So this problem is probably not related to AST threads. Thanks for checking that out, and sorry for the red herring. > Given that I'd suggest that we just build against threaded AST and Mark takes > a note to look at this sometime. OK, but note that sometime may not be soon. -- Mark Taylor Astronomical Programmer Physics, Bristol University, UK [log in to unmask] +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/