On 11 January 2011 10:11, Peter W. Draper <[log in to unmask]> wrote:
> On Mon, 10 Jan 2011, Tim Jenness wrote:
>
>> On Jan 7, 2011, at 8:08 AM, Mark Taylor wrote:
>>
>>> On Fri, 7 Jan 2011, Peter W. Draper wrote:
>>>
>>>>> Having done all that ... running the JNIAST unit tests is now giving me
>>>>> occasional core dumps :-(. This is probably a threading issue - the SEGVs
>>>>> are in uk.ac.starlink.ast.ThreadTest. I've attached an example dump file.
>>>>> Difficult to reproduce, currently happening something like one time in ten,
>>>>> but in the way of these things it seems to go away if you just run the test
>>>>> ten times. I don't *think* this problem is present in the build before
>>>>> these changes. It looks to me unlikely that the changes I've just made to
>>>>> JNIAST would cause this (though I can't say it's impossible), which suggests
>>>>> it's to do with how it's built. The builds I've done have been against
>>>>> hawaiki (AST 5.3-1) or nanahope (AST 5.2-0) - I don't have a 5.1-0 to hand.
>>>>> So, it could be version issues. I suspect however that it might go away if
>>>>> it's built against an AST built without threading. Although I
>>>>> believe/believed that JNIAST ought to work with a multithreaded AST, it
>>>>> doesn't actually benefit much from multithreaded AST (see comments in
>>>>> jniast.h for the sorry tale), so this wouldn't have much of a downside (and
>>>>> the previous build might well have been against a threadless AST, though I'm
>>>>> not sure if it was or not). If we can't resolve this we should put the
>>>>> changes back on a branch or otherwise roll them back - it is not safe to
>>>>> leave them without a native library rebuild, since the java and C sides of
>>>>> the JNI code do not quite match when using the old native libs.
>>>>
>>>> Hi Mark & David,
>>>>
>>>> OK, I've rebuilt JNIAST against the latest AST (5.4) and this problem is
>>>> still there, and it's repeatable as in I get to the same line of code each
>>>> time, and it seems likely to be the same one that Mark's dump shows:
>>>>
>>>> 1225: MAKE_RESAMPLEX(I,int,jint,Int,I)
>>>>
>>>> in Mapping.c.
>>>>
>>>> Must be worth one of you looking at that macro for any shared resources
>>>> that are not being locked (there are some structs malloc outside of any
>>>> locks I can see, or David worrying about astResample, but I imagine that is
>>>> heavily used?).
>>>>
>>>> I'd prefer not to require a non-threaded AST library as that will mean
>>>> we cannot build against a standard tree (that is bound to come back as a
>>>> problem).
>>>
>>> I did a bit more investigation, and it turns out that contrary to my
>>> assessment above, the same issue can show up in the existing JNIAST, so this
>>> has been lurking for a while, i.e. before the FitsChan-related fixes
>>> suggested by David. I'm not sure why I haven't seen the failure before, but
>>> likely it's just that the last time I was doing JNIAST work I had a
>>> different machine and it just never hit the relevant race condition. On the
>>> bright side, as far as I know nobody has come across this issue in real life
>>> - probably nobody is doing massively multithreaded JNIAST.
>>>
>>> So, the new build is (most likely) no worse than the one we had before.
>>> It's quite possible that a non-threaded AST wouldn't even help.
>>>
>>> Another point: the test in question calls astResample in loads of threads
>>> at once just to see if there's trouble. I picked astResample only since
>>> it's an AST function that takes a reasonable amount of CPU time to run. So,
>>> the problem may not be specifically to do with the MAKE_RESAMPLEX macro, it
>>> may be to do with the JNIAST framework in general (which was really what
>>> that test was designed to look for).
>>>
>>
>> is it possible to run the test with valgrind?
>
> I tried that yesterday and it got me nowhere (valgrind has always has issues
> with Java, seems to be slowing getting better, but still not useful).
>
>> It is also possible for the macros to be expanded prior to compilation
>> (David has a script for that in the AST tree) so that a more useful line
>> number would turn up.
>
> That is probably worth a go.
The script is called "cexpand". For instance, doing
% cexpand frameset.c
will produce a (large!) file called frameset.cpp. This file can be
compiled as per a normal c file. It is formed by using cpp to expand
all macros, and then reformatting the result into more readable form.
David
|