Summary: | ASTERISK-16996: ooh323 crashes with segmentation fault every 2-3 days | ||
Reporter: | brett (celtic6969) | Labels: | |
Date Opened: | 2010-11-21 22:14:18.000-0600 | Date Closed: | 2010-12-01 13:33:45.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Addons/chan_ooh323 |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) log_full.txt ( 1) log_h323_log.txt ( 2) ooh323.conf ( 3) show_locks.txt ( 4) trace_gdb.txt | |
Description: | Server using Asterisk 1.6.2.14 and asterisk-addons 1.6.2.2 (both tarball releases). The server receives inbound ooh323 calls (20 calls per minute) which are terminated by the caller before the call is answered. After 2 to 3 days the asterisk will core dump with segmentation fault. ****** ADDITIONAL INFORMATION ****** OS: CentOS 5.4 kernel: 2.6.18-194.26.1.el5 asterisk: 1.6.2.14 (DEBUG_THREADS & DONT_OPTIMIZE) addons: 1.6.2.2 Will attach gdb trace and logs. | ||
Comments: | By: brett (celtic6969) 2010-11-21 22:38:03.000-0600 Restarted the asterisk today and enabled ooh323 and RTP debug. OOH323 looks to have died quietly a few hours later, asterisk console is still accessable but the server not processing any h323 calls. By: BrettH (zeero) 2010-11-22 09:12:40.000-0600 My login name has changed from celtic69 -> zeero. Following my previous note, a "core show locks" was attached to this case after ooh323 had died but the console was still responding. By: BrettH (zeero) 2010-11-24 06:23:42.000-0600 I tried 1.4 SVN with same result :( Connected to Asterisk SVN-branch-1.4-r295906 currently running on receiver01 (pid = 16743) Verbosity is at least 6 Core debug is at least 6 receiver01*CLI> core show locks ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <file> <line num> <function> <lock name> <lock addr> (times locked) === === Thread ID: -1208968304 (ooh323c_stack_thread started at [ 43] ooh323cDriver.c ooh323c_start_stack_thread()) === ---> Lock #0 (chan_ooh323.c): MUTEX 1517 onCallCleared &p->lock 0x9d034f0 (1) === ---> Waiting for Lock #1 (channel.c): MUTEX 920 __ast_queue_frame (channel lock) 0x9d01ee0 (1) === --- ---> Locked Here: channel.c line 1652 (ast_hangup) === ------------------------------------------------------------------- === === Thread ID: -1209459824 (pbx_thread started at [ 2705] pbx.c ast_pbx_start()) === ---> Lock #0 (channel.c): MUTEX 1652 ast_hangup (channel lock) 0x9d01ee0 (1) === ---> Waiting for Lock #1 (chan_ooh323.c): MUTEX 881 ooh323_hangup &p->lock 0x9d034f0 (1) === --- ---> Locked Here: chan_ooh323.c line 1517 (onCallCleared) === ------------------------------------------------------------------- === ======================================================================= Can anyone please assist or provide a recommendation? By: Alexander Anikin (may213) 2010-11-24 10:46:01.000-0600 Hi, for first i recommend upgrade to 1.8 version because there is too much various errors in the original ooh323 codes, many of these fixed for 1.8 I think that 1.4/1.6 addons version of chan_ooh323 is not usable for production, especially for high load environment. This is due to singlethread model and many bugs in codes. By: Alexander Anikin (may213) 2010-11-24 11:25:34.000-0600 for trying to solve this problem in addons-1.6.2.2 please attach bt full gdb output here. Main reason is a ooh323 stack memory heap corruption as i think. By: BrettH (zeero) 2010-11-24 23:15:36.000-0600 Thanks for responding May. I will install 1.8 SVN version and compile with the relevant debug flags. I will provide an update soon. By: Alexander Anikin (may213) 2010-11-24 23:38:43.000-0600 Ok, i think there will not this trouble with 1.8 or trunk. By: BrettH (zeero) 2010-11-25 07:54:06.000-0600 Hi May, I have installed "Asterisk SVN-branch-1.8-r296230" and ooh323 is currently processing calls. I will need to monitor for at least 3 days to measure stability. I have 2 concerns with version 1.8: 1) ooh323 would not accept any calls (chan_ooh323.c:1797 ooh323_onReceivedSetup: Unacceptable ip 10.8.6.252) until I configured a "user profile" and IP address for every Cisco gateway in the network. Is there a way to change this behaviour as this will be an issue for customers with many h.323 gateways and no gatekeeper. 2) The asterisk process was initially running at 98.9% cpu utilisation for at least 1 hour but was still processing calls. I then restarted the asterisk process and it is currently running at 0.2%. I will monitor and see if the high cpu reoccurs. By: Alexander Anikin (may213) 2010-11-25 17:42:13.000-0600 Hi, 1. yes, it's normal, you must define profile for all known peers/users or gateway else you'll have hole for unauthorized access to calls via your system. I haven't idea how to make guest access without this hole. Btw for SIP you must create definition for every endpoint also. 2. I suggest that 100 or above % cpu usage is not h323 trouble possible some other module in asterisk By: BrettH (zeero) 2010-11-30 18:58:02.000-0600 Hi May, Thanks for your help, asterisk 1.8 has been stable for 5 days. I think we can close this issue now and if the problem reoccurs I will create a new issue. Thanks again. By: Alexander Anikin (may213) 2010-12-01 13:32:39.000-0600 Hi, Thanks you for testing, i'll close this issue, feel free to reopen or create new issue if it will need. |