On Thu, 17 Jul 2008, David Berry wrote:
>>> Things are worse on Intel systems, in that the multi-threaded apps
>>> often fail to run at all, crashing at random places. Using Intel
>>> compilers rather than gnu seems to fix this crashing problem. But of
>>> course we cannot rely on Intel compilers. At the moment, I'm
>>> struglling with the Intel thread checker tool, to see if it can cast
>>> any light on where thingsd are going wrong.
>>
>> Sounds grim. I saw the numbers posted by Brad a while back and thought
>> it looked not quite as fast as we'd hoped, but didn't realise things
>> where this bad.
>>
>> One thought occurred to me about GCC crashing, it does have a flag
>> -pthread(s) that is used on some platforms for compiling, so I've had a
>> quick look at that area again.
>
> Yes. I was in some confusion about whether we should be using -pthread
> or -lpthread. I found something somewhere that said that -pthread
> implied -lpthread, and also caused app to be linked with the thread-safe
> versions of the run-time-library. So I decided to go with -pthread, and
> leave out -lpthread.
>
>> Anyway I now think we should be defining -D_REENTRANT when compiling.
>> This supposedly makes any references to errno protected and I can see
>> some naked references to that in AST (object.c, axis.c, channel.c,
>> mathmap.c). Worth a try.
>
> Thanks for the tip. Do you have any references for -D_REENTRANT. A
> quick google produces very little.
Not a lot really, I know it's about the only side-effect of -pthread when
compiling. After looking at this again (again), I'm now not convinced it
makes a difference (at least to errno on Linux, that appears to be wrapped
regardless), as I said worth a try however (or even gcc -pthread).
Another area I'm worried about is which runtime calls are really
reentrant. There is a list of functions not required to be reentrant in
the UNIX standard:
asctime() basename() catgets() crypt() ctime() dbm_clearerr()
dbm_close() dbm_delete() dbm_error() dbm_fetch() dbm_firstkey()
dbm_nextkey() dbm_open() dbm_store() dirname() dlerror() drand48()
ecvt() encrypt() endgrent() endpwent() endutxent() fcvt() ftw() gcvt()
getc_unlocked() getchar_unlocked() getdate() getenv() getgrent()
getgrgid() getgrnam() gethostbyaddr() gethostbyname()
gethostent() getlogin() getnetbyaddr() getnetbyname() getnetent()
getopt() getprotobyname() getprotobynumber() getprotoent() getpwent()
getpwnam() getpwuid() getservbyname() getservbyport() getservent()
getutxent() getutxid()
getutxline() gmtime() hcreate() hdestroy() hsearch() inet_ntoa() l64a()
lgamma() lgammaf() lgammal() localeconv() localtime() lrand48()
mrand48() nftw() nl_langinfo() ptsname()
putc_unlocked() putchar_unlocked() putenv() pututxline() rand()
readdir() setenv() setgrent() setkey() setpwent() setutxent() strerror()
strtok() ttyname() unsetenv() wcstombs() wctomb()
Some of these have safe "_r" forms, like strerror_r, which I'm thinking
might be a good one to use in EMS. I believe most other functions are
thread safe, albeit with mutexes when needed (that make's you wonder about
the performance of malloc or anything else that depends on global data).
> In some breaking news, the Intel thread checker has thrown up a usage
> of Fortran SLALIB (from smurf, not AST), together with some static
> variables in smurf that I had over-looked. So I've got a few leads to
> go on now.
That's a relief.
Peter.
|