On Jan 7, 2011, at 8:08 AM, Mark Taylor wrote: > On Fri, 7 Jan 2011, Peter W. Draper wrote: > >>> Having done all that ... running the JNIAST unit tests is now giving me >>> occasional core dumps :-(. This is probably a threading issue - the SEGVs >>> are in uk.ac.starlink.ast.ThreadTest. I've attached an example dump file. >>> Difficult to reproduce, currently happening something like one time in ten, >>> but in the way of these things it seems to go away if you just run the test >>> ten times. I don't *think* this problem is present in the build before >>> these changes. It looks to me unlikely that the changes I've just made to >>> JNIAST would cause this (though I can't say it's impossible), which suggests >>> it's to do with how it's built. The builds I've done have been against >>> hawaiki (AST 5.3-1) or nanahope (AST 5.2-0) - I don't have a 5.1-0 to hand. >>> So, it could be version issues. I suspect however that it might go away if >>> it's built against an AST built without threading. Although I >>> believe/believed that JNIAST ought to work with a multithreaded AST, it >>> doesn't actually benefit much from multithreaded AST (see comments in >>> jniast.h for the sorry tale), so this wouldn't have much of a downside (and >>> the previous build might well have been against a threadless AST, though I'm >>> not sure if it was or not). If we can't resolve this we should put the >>> changes back on a branch or otherwise roll them back - it is not safe to >>> leave them without a native library rebuild, since the java and C sides of >>> the JNI code do not quite match when using the old native libs. >> >> Hi Mark & David, >> >> OK, I've rebuilt JNIAST against the latest AST (5.4) and this problem is still >> there, and it's repeatable as in I get to the same line of code each time, and >> it seems likely to be the same one that Mark's dump shows: >> >> 1225: MAKE_RESAMPLEX(I,int,jint,Int,I) >> >> in Mapping.c. >> >> Must be worth one of you looking at that macro for any shared resources that >> are not being locked (there are some structs malloc outside of any locks I can >> see, or David worrying about astResample, but I imagine that is heavily >> used?). >> >> I'd prefer not to require a non-threaded AST library as that will mean we >> cannot build against a standard tree (that is bound to come back as a >> problem). > > I did a bit more investigation, and it turns out that contrary to my > assessment above, the same issue can show up in the existing JNIAST, > so this has been lurking for a while, i.e. before the FitsChan-related > fixes suggested by David. I'm not sure why I haven't > seen the failure before, but likely it's just that the last time I was > doing JNIAST work I had a different machine and it just never hit > the relevant race condition. On the bright side, as far as > I know nobody has come across this issue in real life - probably > nobody is doing massively multithreaded JNIAST. > > So, the new build is (most likely) no worse than the one we had before. > It's quite possible that a non-threaded AST wouldn't even help. > > Another point: the test in question calls astResample in loads of > threads at once just to see if there's trouble. I picked astResample > only since it's an AST function that takes a reasonable amount of CPU > time to run. So, the problem may not be specifically to do with > the MAKE_RESAMPLEX macro, it may be to do with the JNIAST framework > in general (which was really what that test was designed to look for). > is it possible to run the test with valgrind? It is also possible for the macros to be expanded prior to compilation (David has a script for that in the AST tree) so that a more useful line number would turn up. -- Tim Jenness Joint Astronomy Centre