David et al.,
as promised, I've finally got round to looking at the JNIAST issue.
On Tue, 16 Nov 2010, David Berry wrote:
> Hi Mark,
> My suspicion falls on the THASTLOCK macro defined in
> jniast.h. as follows:
>
> #define THASTLOCK(ast_objs, code) \
> jniastLock(ast_objs); \
> code \
> jniastUnlock(ast_objs); \
> free(ast_objs);
>
> >From what I can see this is only used in one place, in the definition
> of the THASTCALL macro, also defined in jniast.h:
>
> #define THASTCALL(ast_objs, code) \
> ASTCALL( THASTLOCK(ast_objs, code) )
>
> THASTCALL is used lots of times, usually in the form
>
> THASTCALL( jniastList( 1, pointer.AstObject ),
> <xxx>
> )
>
> So, the "ast_objs" argument for the THASTLOCK is usually an invocation
> of jniastList, which means that a typical use of THASTLOCK would look
> as follows after expansion by the C pre-processor:
>
> jniastLock(jniastList( 1, pointer.AstObject ) ); \
> <xxx> \
> jniastUnlock(jniastList( 1, pointer.AstObject )); \
> free(jniastList( 1, pointer.AstObject ));
>
> So, for instance, the final call to "free" in the macro expansion
> above does not in fact free the original memory allocated by the first
> call to jniastList. Instead it calls jniastlist again, and then
> immediately frees the memory allocated by this new call.
>
> So I'm suggesting that the THASTLOCK macros definition in jniast.h
> should be changed to be something like this:
>
> #define THASTLOCK(ast_objs, code) { \
> AstObject **objlist = (ast_objs); \
> jniastLock(objlist); \
> code \
> jniastUnlock(objlist); \
> astFree(objlist); \
> }
>
> That is, create a new C "{ }" block, and within the block declare a
> local variable "objlist" to store the pointer returned by jniastLock.
> Then use this pointer, rather than the "ast_objs" macro argument,
> throughout the rest of the macro body.
>
> It seems to clear up the problem. What do you think?
Excellent work, that is certainly a bug as you explain, and it does
indeed appear to be causing the symptoms displayed by my test case.
Well spotted.
I have committed this change, which fixes the C memory leak, and
the changes I earlier made on the jniast-fitschan-fix branch,
which fix the Java memory leak, on the starjava trunk.
The native libraries still need to be built (Peter volunteered to do
this in an earlier mail), which leads to the question of what version
of AST they should be built against. The current version was built
against AST v5.1-0. Probably it is safest to build against that
again - a build against Hawaiki seems to work (though see below),
but I haven't been through the differences between 5.1-0 and
5.3-1 (or later) to know whether there is likely to be any problem.
The unit tests are quite extensive, but small inconsistencies between
the C and Java sides of the JNI code can have disastrous consequences
(core dump, which should be avoided at all costs from java code),
so there is potential for trouble. On the other hand
there may be changes in more recent AST implementation which make
a more recent version desirable (changes in the API are probably not
relevant, since without changes to the JNIAST code they will not be
visible from JNIAST). I've made some minor changes to the unit
tests so that they pass on both AST versions 5.1-0 and 5.3-1
(removed references to the withdrawn XmlIndent attribute of XmlChan).
Having done all that ... running the JNIAST unit tests is now giving
me occasional core dumps :-(. This is probably a threading issue -
the SEGVs are in uk.ac.starlink.ast.ThreadTest. I've attached an
example dump file. Difficult to reproduce,
currently happening something like one time in ten, but in the way of
these things it seems to go away if you just run the test ten
times. I don't *think* this problem is present in the build before
these changes. It looks to me unlikely that the changes I've just
made to JNIAST would cause this (though I can't say it's impossible),
which suggests it's to do with how it's built. The builds I've done have
been against hawaiki (AST 5.3-1) or nanahope (AST 5.2-0) - I don't have
a 5.1-0 to hand. So, it could be version issues. I suspect however
that it might go away if it's built against an AST built without
threading. Although I believe/believed that JNIAST ought to work
with a multithreaded AST, it doesn't actually benefit much from
multithreaded AST (see comments in jniast.h for the sorry tale),
so this wouldn't have much of a downside (and the previous build might
well have been against a threadless AST, though I'm not sure if it was
or not). If we can't resolve this we should put the changes
back on a branch or otherwise roll them back - it is not safe to
leave them without a native library rebuild, since the java and C
sides of the JNI code do not quite match when using the old native
libs.
I consider this off my desk for now. I'd be pleased if it stayed that
way :-).
Happy new year!
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
[log in to unmask] +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
|