Ticket #696 (new defect)

Opened 2 months ago

Last modified 2 months ago

Agent threads are not joined safely

Reported by: hakan@… Owned by:
Priority: major Milestone:
Component: LEAP Version: 3.0.0-GA
Keywords: threads Cc:
patch waiting for maintainer: no

Description

When saImmOmInitialize is invoked the first time, a few threads are created. When saImmOmFinalize is invoked for the last handle, these threads exits after a while. Unfortunately at least one of the threads is still running after saImmOmFinalize has returned. This is very unfortunate as there are no other synchronization primitives in the API (as far as I can see) that we can use to wait for the final OpenSAF thread to exit. We have encountered some nasty crashes when we unload the OpenSAF agent library code too soon after the final call to saImmOmFinalize. You should wait for all threads to join before saImmOmFinalize returns.

It may be the case that there are other places in the OpenSAF agent library code that suffers from the same bug. See the stack trace from gdb below for details about which threads that are active before the final call to saImmOmFinalize.

/Håkan

Håkan Mattsson, Erlang/OTP, Ericsson AB

(gdb) thr 2
[Switching to thread 2 (process 15689)]#0 0x00002b0d895a29a2 in select () from /lib64/libc.so.6
(gdb) bt
#0 0x00002b0d895a29a2 in select () from /lib64/libc.so.6
#1 0x00002aaaab174a05 in ncs_sel_obj_select (highest_sel_obj={raise_obj = 15, rmv_obj = 16}, rfds=0x420684b0, wfds=0x0, efds=0x0, timeout_in_10ms=0x0) at src/os_defs.c:2818
#2 0x00002aaaab149d23 in ncs_ipc_recv_common (mbx=0x2aaaab2e2580, block=1) at src/sysf_ipc.c:447
#3 0x00002aaaab149bf5 in ncs_ipc_recv (mbx=0x2aaaab2e2580) at src/sysf_ipc.c:394
#4 0x00002aaaab1b736b in dta_do_evts (mbx=0x2aaaab2e2580) at dta_api.c:1260
#5 0x00002b0d892cb143 in start_thread () from /lib64/libpthread.so.0
#6 0x00002b0d895a8b8d in clone () from /lib64/libc.so.6
#7 0x0000000000000000 in ?? ()
(gdb) thr 3
[Switching to thread 3 (process 15688)]#0 0x00002b0d895a08b6 in poll () from /lib64/libc.so.6
(gdb) bt
#0 0x00002b0d895a08b6 in poll () from /lib64/libc.so.6
#1 0x00002aaaab17d39e in mdtm_process_recv_events () at src/mds_dt_tipc.c:640
#2 0x00002b0d892cb143 in start_thread () from /lib64/libpthread.so.0
#3 0x00002b0d895a8b8d in clone () from /lib64/libc.so.6
#4 0x0000000000000000 in ?? ()
(gdb) thr 4
[Switching to thread 4 (process 15687)]#0 0x00002b0d895a29a2 in select () from /lib64/libc.so.6
(gdb) bt
#0 0x00002b0d895a29a2 in select () from /lib64/libc.so.6
#1 0x00002aaaab14e4d6 in ncs_tmr_wait () at src/sysf_tmr.c:541
#2 0x00002b0d892cb143 in start_thread () from /lib64/libpthread.so.0
#3 0x00002b0d895a8b8d in clone () from /lib64/libc.so.6
#4 0x0000000000000000 in ?? ()

Attachments

Change History

Changed 2 months ago by marioa

  • component changed from unknown to IMMSv
  • milestone PL 3.0.1 deleted

Changed 2 months ago by anders

  • priority changed from critical to major
  • component changed from IMMSv to LEAP

ncs_agents_shutdown should block (or have an option to block) until all threads
are terminated.

Add/Change #696 (Agent threads are not joined safely)

Author



Action
as new
Note: See TracTickets for help on using tickets.