Hi Maarten,
Thanks for your efforts so far. I have installed the patches on
boszwijn.nikhef.nl:
[root@boszwijn root]# rpm -qa | egrep '^vdt_globus_(sdk|essentials)'
vdt_globus_essentials-VDT1.2.2rh9_LCG-3
vdt_globus_sdk-VDT1.2.2rh9_LCG-3
but unfortunately the server still crashes:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 37338032 (LWP 6196)]
0x00acfc85 in readdir64_r@@GLIBC_2.2 () from /lib/tls/libc.so.6
(gdb) where
#0 0x00acfc85 in readdir64_r@@GLIBC_2.2 () from /lib/tls/libc.so.6
#1 0x00f93dbd in gridmapdir_newlease (
encodedglobusidp=0xb74e49c0
"%2fo%3ddutchgrid%2fo%3dusers%2fo%3dnikhef%2fcn%3dronald%20starink",
usernameprefix=0xb7516b18 "dteam") at gridmap.c:255
#2 0x00f9401c in gridmapdir_userid (
globusidp=0xb7516b90 "/O=dutchgrid/O=users/O=nikhef/CN=Ronald Starink",
usernameprefix=0xb7516b18 "dteam", useridp=0x239b6d8) at gridmap.c:344
#3 0x00f9449d in globus_gss_assist_gridmap (
globusidp=0xb7516b90 "/O=dutchgrid/O=users/O=nikhef/CN=Ronald
Starink", useridp=0x239b6d8)
at gridmap.c:575
#4 0x00cd67f3 in
edg::workload::common::socket_pp::GSISocketServer::AcceptGSIAuthentication
()
from /opt/edg/lib/libedg_wl_gsisocket_pp.so.0
#5 0x00cd783f in
edg::workload::common::socket_pp::GSISocketServer::AuthenticateAgent ()
from /opt/edg/lib/libedg_wl_gsisocket_pp.so.0
#6 0x08119086 in edg::workload::networkserver::daemon::Manager::run ()
#7 0x0811782b in
edg::workload::common::task::ForwarderFunctor<edg::workload::common::socket_pp::GSISocketAgent*,
classad::ClassAd*>::operator() ()
#8 0x08117337 in
boost::detail::function::void_function_obj_invoker0<edg::workload::common::task::ForwarderFunctor<edg::workload::common::socket_pp::GSISocketAgent*,
classad::ClassAd*>, void>::invoke
()
#9 0x0819cee1 in boost::thread_group::join_all ()
#10 0x00304dd8 in start_thread () from /lib/tls/libpthread.so.0
#11 0x00b06d1a in clone () from /lib/tls/libc.so.6
(gdb)
I also tried setting the number of threads (on bosheks.nikhef.nl) to 1:
NetworkServer = [
...
II_Contact = "bosheks.nikhef.nl";
ListeningPort = 7772;
MasterThreads = 1;
DispatcherThreads = 1;
...
];
Nevertheless, also on this host the networkserver keeps crashing.
We use a shared gridmapdir via an NFS mount, so all RBs access the same
directory.
Any further ideas what may cause the crashes?
Cheers,
Ronald
[log in to unmask] wrote:
> On Mon, 23 Apr 2007, Jan Just Keijser wrote:
>
>> hmmm
>> readdir is crashing in a multithreaded application... shouldn't we be
>> using readdir_r and readdir64_r when using threads? I've looked at the
>> original gridmapdir patch and indeed, it lists
>>
>> +void
>> +gridmapdir_newlease(char * encodedglobusidp,
>> + char * usernameprefix)
>> +{
>> + int ret;
>> + char *userfilename, *encodedfilename, *gridmapdir;
>> + struct dirent *gridmapdirentry;
>> + DIR *gridmapdirstream;
>> + struct stat statbuf;
>> +
>> + gridmapdir = getenv("GRIDMAPDIR");
>> + if (gridmapdir == NULL) return;
>> +
>> + encodedfilename = malloc(strlen(gridmapdir) + (size_t) 2 +
>> + strlen(encodedglobusidp));
>> + sprintf(encodedfilename, "%s/%s", gridmapdir, encodedglobusidp);
>> +
>> + gridmapdirstream = opendir(gridmapdir);
>> +
>> + while ((gridmapdirentry = readdir(gridmapdirstream)) != NULL)
>>
>>
>> which indeed, is not threadsafe...
>> so this is a "sleeping" bug that can hit anybody as soon as
>> - the system is under heavy load
>> - the gridmapdir gets too large
>>
>> how can we either
>> - limit the number of threads of the network-server
>> - patch and recompile the vdt_globus_essentials package, with gridmapdir
>> patch
>
> A patched rpm has been tested and is available here for the time being:
>
> http://litmaath.home.cern.ch/litmaath/gmd-fix/
>
> The diffs:
>
> -----------------------------------------------------------------------------
> --- gridmap.c.orig 2004-05-25 23:04:18.000000000 +0200
> +++ gridmap.c 2007-04-23 23:06:40.000000000 +0200
> @@ -121,7 +121,7 @@
> int ret;
> char *firstlinkpath, *otherlinkdup, *otherlinkpath,
> *gridmapdir;
> - struct dirent *gridmapdirentry;
> + struct dirent *gridmapdirentry = 0, gmde;
> DIR *gridmapdirstream;
> struct stat statbuf;
> ino_t firstinode;
> @@ -142,7 +142,8 @@
>
> if (gridmapdirstream != NULL)
> {
> - while ((gridmapdirentry = readdir(gridmapdirstream)) != NULL)
> + while (readdir_r(gridmapdirstream, &gmde, &gridmapdirentry) == 0 &&
> + gridmapdirentry != NULL)
> {
> if (strcmp(gridmapdirentry->d_name, firstlink) == 0) continue;
>
> @@ -238,7 +239,7 @@
> {
> int ret;
> char *userfilename, *encodedfilename, *gridmapdir;
> - struct dirent *gridmapdirentry;
> + struct dirent *gridmapdirentry = 0, gmde;
> DIR *gridmapdirstream;
> struct stat statbuf;
>
> @@ -251,7 +252,8 @@
>
> gridmapdirstream = opendir(gridmapdir);
>
> - while ((gridmapdirentry = readdir(gridmapdirstream)) != NULL)
> + while (readdir_r(gridmapdirstream, &gmde, &gridmapdirentry) == 0 &&
> + gridmapdirentry != NULL)
> {
> /* we dont want any files that dont look like acceptable usernames */
> if ((*(gridmapdirentry->d_name) == '%') ||
> -----------------------------------------------------------------------------
>
> I will submit a bug and a patch, which should be certified fairly soon.
> Thanks,
> Maarten
|