Thanks Mark. So since I don't suppose any of us have the time needed
to investigate the threading problems you describe, maybe the best
thing to do is for Peter to build the native jniast libraries against
a threadless AST V5.3-1.
David
On 5 January 2011 18:24, Mark Taylor <[log in to unmask]> wrote:
> David et al.,
>
> as promised, I've finally got round to looking at the JNIAST issue.
>
> On Tue, 16 Nov 2010, David Berry wrote:
>
>> Hi Mark,
>> My suspicion falls on the THASTLOCK macro defined in
>> jniast.h. as follows:
>>
>> #define THASTLOCK(ast_objs, code) \
>> jniastLock(ast_objs); \
>> code \
>> jniastUnlock(ast_objs); \
>> free(ast_objs);
>>
>> >From what I can see this is only used in one place, in the definition
>> of the THASTCALL macro, also defined in jniast.h:
>>
>> #define THASTCALL(ast_objs, code) \
>> ASTCALL( THASTLOCK(ast_objs, code) )
>>
>> THASTCALL is used lots of times, usually in the form
>>
>> THASTCALL( jniastList( 1, pointer.AstObject ),
>> <xxx>
>> )
>>
>> So, the "ast_objs" argument for the THASTLOCK is usually an invocation
>> of jniastList, which means that a typical use of THASTLOCK would look
>> as follows after expansion by the C pre-processor:
>>
>> jniastLock(jniastList( 1, pointer.AstObject ) ); \
>> <xxx> \
>> jniastUnlock(jniastList( 1, pointer.AstObject )); \
>> free(jniastList( 1, pointer.AstObject ));
>>
>> So, for instance, the final call to "free" in the macro expansion
>> above does not in fact free the original memory allocated by the first
>> call to jniastList. Instead it calls jniastlist again, and then
>> immediately frees the memory allocated by this new call.
>>
>> So I'm suggesting that the THASTLOCK macros definition in jniast.h
>> should be changed to be something like this:
>>
>> #define THASTLOCK(ast_objs, code) { \
>> AstObject **objlist = (ast_objs); \
>> jniastLock(objlist); \
>> code \
>> jniastUnlock(objlist); \
>> astFree(objlist); \
>> }
>>
>> That is, create a new C "{ }" block, and within the block declare a
>> local variable "objlist" to store the pointer returned by jniastLock.
>> Then use this pointer, rather than the "ast_objs" macro argument,
>> throughout the rest of the macro body.
>>
>> It seems to clear up the problem. What do you think?
>
> Excellent work, that is certainly a bug as you explain, and it does
> indeed appear to be causing the symptoms displayed by my test case.
> Well spotted.
>
> I have committed this change, which fixes the C memory leak, and
> the changes I earlier made on the jniast-fitschan-fix branch,
> which fix the Java memory leak, on the starjava trunk.
>
> The native libraries still need to be built (Peter volunteered to do
> this in an earlier mail), which leads to the question of what version
> of AST they should be built against. The current version was built
> against AST v5.1-0. Probably it is safest to build against that
> again - a build against Hawaiki seems to work (though see below),
> but I haven't been through the differences between 5.1-0 and
> 5.3-1 (or later) to know whether there is likely to be any problem.
> The unit tests are quite extensive, but small inconsistencies between
> the C and Java sides of the JNI code can have disastrous consequences
> (core dump, which should be avoided at all costs from java code),
> so there is potential for trouble. On the other hand
> there may be changes in more recent AST implementation which make
> a more recent version desirable (changes in the API are probably not
> relevant, since without changes to the JNIAST code they will not be
> visible from JNIAST). I've made some minor changes to the unit
> tests so that they pass on both AST versions 5.1-0 and 5.3-1
> (removed references to the withdrawn XmlIndent attribute of XmlChan).
>
> Having done all that ... running the JNIAST unit tests is now giving
> me occasional core dumps :-(. This is probably a threading issue -
> the SEGVs are in uk.ac.starlink.ast.ThreadTest. I've attached an
> example dump file. Difficult to reproduce,
> currently happening something like one time in ten, but in the way of
> these things it seems to go away if you just run the test ten
> times. I don't *think* this problem is present in the build before
> these changes. It looks to me unlikely that the changes I've just
> made to JNIAST would cause this (though I can't say it's impossible),
> which suggests it's to do with how it's built. The builds I've done have
> been against hawaiki (AST 5.3-1) or nanahope (AST 5.2-0) - I don't have
> a 5.1-0 to hand. So, it could be version issues. I suspect however
> that it might go away if it's built against an AST built without
> threading. Although I believe/believed that JNIAST ought to work
> with a multithreaded AST, it doesn't actually benefit much from
> multithreaded AST (see comments in jniast.h for the sorry tale),
> so this wouldn't have much of a downside (and the previous build might
> well have been against a threadless AST, though I'm not sure if it was
> or not). If we can't resolve this we should put the changes
> back on a branch or otherwise roll them back - it is not safe to
> leave them without a native library rebuild, since the java and C
> sides of the JNI code do not quite match when using the old native
> libs.
>
> I consider this off my desk for now. I'd be pleased if it stayed that
> way :-).
>
> Happy new year!
>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> [log in to unmask] +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
|