hmmm
readdir is crashing in a multithreaded application... shouldn't we be
using readdir_r and readdir64_r when using threads? I've looked at the
original gridmapdir patch and indeed, it lists
+void
+gridmapdir_newlease(char * encodedglobusidp,
+ char * usernameprefix)
+{
+ int ret;
+ char *userfilename, *encodedfilename, *gridmapdir;
+ struct dirent *gridmapdirentry;
+ DIR *gridmapdirstream;
+ struct stat statbuf;
+
+ gridmapdir = getenv("GRIDMAPDIR");
+ if (gridmapdir == NULL) return;
+
+ encodedfilename = malloc(strlen(gridmapdir) + (size_t) 2 +
+ strlen(encodedglobusidp));
+ sprintf(encodedfilename, "%s/%s", gridmapdir, encodedglobusidp);
+
+ gridmapdirstream = opendir(gridmapdir);
+
+ while ((gridmapdirentry = readdir(gridmapdirstream)) != NULL)
which indeed, is not threadsafe...
so this is a "sleeping" bug that can hit anybody as soon as
- the system is under heavy load
- the gridmapdir gets too large
how can we either
- limit the number of threads of the network-server
- patch and recompile the vdt_globus_essentials package, with gridmapdir
patch
?
regards,
Jan Just Keijser
NIKHEF
Amsterdam
Ronald Starink wrote:
> Maarten Litmaath wrote:
>
>> Ronald Starink wrote:
>>
>>
>>> Not exactly a stack trace:
>>>
>>> Loaded symbols for /lib/tls/libpthread.so.0
>>> Reading symbols from /lib/tls/libm.so.6...done.
>>> Loaded symbols for /lib/tls/libm.so.6
>>> Reading symbols from /lib/libgcc_s.so.1...done.
>>> Loaded symbols for /lib/libgcc_s.so.1
>>> Reading symbols from /lib/tls/libc.so.6...done.
>>> Loaded symbols for /lib/tls/libc.so.6
>>> Reading symbols from /lib/libcrypt.so.1...done.
>>> Loaded symbols for /lib/libcrypt.so.1
>>> Reading symbols from /lib/ld-linux.so.2...done.
>>> Loaded symbols for /lib/ld-linux.so.2
>>> Reading symbols from /opt/edg/lib/libedg_wl_classad_plugin.so...done.
>>> Loaded symbols for /opt/edg/lib/libedg_wl_classad_plugin.so
>>> 0x00ea13ad in pthread_cond_wait@@GLIBC_2.3.2 ()
>>> from /lib/tls/libpthread.so.0
>>> (gdb) finish
>>> Run till exit from #0 0x00ea13ad in pthread_cond_wait@@GLIBC_2.3.2 ()
>>> from /lib/tls/libpthread.so.0
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 38517680 (LWP 19996)]
>>> 0x01026a75 in readdir64@@GLIBC_2.2 () from /lib/tls/libc.so.6
>>> (gdb)
>>>
>> Please type "where" to that prompt...
>>
>
> (gdb) where
> #0 0x01026a75 in readdir64@@GLIBC_2.2 () from /lib/tls/libc.so.6
> #1 0x008c5d6d in gridmapdir_newlease (
> encodedglobusidp=0xb74ed2c0
> "%2fo%3ddutchgrid%2fo%3dusers%2fo%3dnikhef%2fcn%3dronald%20starink",
> usernameprefix=0xb74edf80 "dteam") at gridmap.c:254
> #2 0x008c5fb5 in gridmapdir_userid (
> globusidp=0xb74edff8 "/O=dutchgrid/O=users/O=nikhef/CN=Ronald Starink",
> usernameprefix=0xb74edf80 "dteam", useridp=0x24bb6d8) at gridmap.c:342
> #3 0x008c6436 in globus_gss_assist_gridmap (
> globusidp=0xb74edff8 "/O=dutchgrid/O=users/O=nikhef/CN=Ronald
> Starink", useridp=0x24bb6d8)
> at gridmap.c:573
> #4 0x00c8f7f3 in
> edg::workload::common::socket_pp::GSISocketServer::AcceptGSIAuthentication
> ()
> from /opt/edg/lib/libedg_wl_gsisocket_pp.so.0
> #5 0x00c9083f in
> edg::workload::common::socket_pp::GSISocketServer::AuthenticateAgent ()
> from /opt/edg/lib/libedg_wl_gsisocket_pp.so.0
> #6 0x08119086 in edg::workload::networkserver::daemon::Manager::run ()
> #7 0x0811782b in
> edg::workload::common::task::ForwarderFunctor<edg::workload::common::socket_pp::GSISocketAgent*,
> classad::ClassAd*>::operator() ()
> #8 0x08117337 in
> boost::detail::function::void_function_obj_invoker0<edg::workload::common::task::ForwarderFunctor<edg::workload::common::socket_pp::GSISocketAgent*,
> classad::ClassAd*>, void>::invoke
> ()
> #9 0x0819cee1 in boost::thread_group::join_all ()
> #10 0x00e9edd8 in start_thread () from /lib/tls/libpthread.so.0
> #11 0x0105dd1a in clone () from /lib/tls/libc.so.6
> (gdb)
>
|