Summary: | ASTERISK-10404: segmentation faults on installation with 3000 calls/day. | ||
Reporter: | Peter Kozak (spag) | Labels: | |
Date Opened: | 2007-09-28 06:57:30 | Date Closed: | 2007-11-05 14:12:57.000-0600 |
Priority: | Critical | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) ivan_bt_full_10840.log | |
Description: | Asterisk crashes 2-3 times a day on debian (sarge) system (core2duo 2GHz, 4GB RAM). Tested on Asterisk 1.4.10.1 and 1.4.11 Aterisk crashes only at normal office hours (3000 calls a day), never when the system is idle. I wasn't able to reproduce this crashes, except waiting patiently until it happens again. Sometimes the asterisk process is simply not responding (and eating 100% of CPU) sometime it crashes with signal 11. SSH access to the affected machine will be granted on request. Core dumped: Core was generated by `/usr/sbin/asterisk -f -vg'. Program terminated with signal 11, Segmentation fault. (gdb) bt full #0 0xb7d95709 in free () from /lib/tls/i686/cmov/libc.so.6 No symbol table info available. #1 0x080a7d68 in ast_frame_free (fr=0xa0059a4, cache=1) at frame.c:360 __PRETTY_FUNCTION__ = "ast_frame_free" #2 0x080891b8 in ast_generic_bridge (c0=0xa002da0, c1=0xa004b40, config=0xb62c6270, fo=0xb62c5f20, rc=0xb62c5f1c, bridge_end= {tv_sec = 0, tv_usec = 0}) at /var/tmp/src/asterisk-1.4.10.1/include/asterisk/frame.h:390 who = (struct ast_channel *) 0xa004b40 other = (struct ast_channel *) 0xa002da0 cs = {0xa002da0, 0xa004b40, 0xa004b40} f = (struct ast_frame *) 0xa0059a4 res = AST_BRIDGE_COMPLETE o0nativeformats = 8 o1nativeformats = 64 watch_c0_dtmf = 0 watch_c1_dtmf = 0 pvt0 = (void *) 0x9de8f90 pvt1 = (void *) 0x10 frame_put_in_jb = 0 jb_in_use = 0 to = -1 __PRETTY_FUNCTION__ = "ast_generic_bridge" #3 0x0808a20f in ast_channel_bridge (c0=0xa002da0, c1=0xa004b40, config=0xb62c6270, fo=0xb62c5f20, rc=0xb62c5f1c) at channel.c:4294 now = {tv_sec = 0, tv_usec = 0} to = -1 who = (struct ast_channel *) 0x0 res = AST_BRIDGE_COMPLETE nativefailed = 0 firstpass = 1 o0nativeformats = 8 o1nativeformats = 64 time_left_ms = 0 nexteventts = {tv_sec = 0, tv_usec = 0} caller_warning = 0 '\0' callee_warning = 0 '\0' __PRETTY_FUNCTION__ = "ast_channel_bridge" #4 0xb7736cf1 in ast_bridge_call (chan=0xa002da0, peer=0xa004b40, config=0xb62c6270) at res_features.c:1394 other = (struct ast_channel *) 0x130f f = (struct ast_frame *) 0x0 who = (struct ast_channel *) 0x8141a9f chan_featurecode = '\0' <repeats 11 times> peer_featurecode = '\0' <repeats 11 times> res = 0 diff = -1 hasfeatures = 0 hadfeatures = 0 aoh = (struct ast_option_header *) 0xb62c62a4 backup_config = {features_caller = {flags = 0}, features_callee = {flags = 0}, start_time = {tv_sec = 0, tv_usec = 0}, feature_timer = 0, timelimit = 0, play_warning = 0, warning_freq = 0, warning_sound = 0x0, end_sound = 0x0, start_sound = 0x0, firstpass = 0, flags = 0} bridge_cdr = (struct ast_cdr *) 0xb62c5f78 __PRETTY_FUNCTION__ = "ast_bridge_call" ASTERISK-1 0xb6aae4e8 in dial_exec_full (chan=0xa002da0, data=0xb62c8ff8, peerflags=0xb62c6e64, continue_exec=0x0) at app_dial.c:1651 config = {features_caller = {flags = 0}, features_callee = {flags = 0}, start_time = {tv_sec = 1190822756, tv_usec = 9094}, feature_timer = 0, timelimit = 0, play_warning = 0, warning_freq = 0, warning_sound = 0x0, end_sound = 0x0, start_sound = 0x0, firstpass = 0, flags = 0} number = 0x9f678d1 "iaxmodem01/718" end_time = 42 answer_time = 1190822756 res = 0 u = (struct ast_module_user *) 0x8769b08 rest = 0x0 cur = 0x0 outgoing = (struct dial_localuser *) 0x0 peer = (struct ast_channel *) 0xa004b40 to = -1 numbusy = 0 numcongestion = 0 numnochan = 0 cause = 0 numsubst = "iaxmodem01/718\000\b$$?\017\023\000\000<8\024\b??000\000\000\000\000\000\000\000\035.?4m,Sep 26 18:05:55\000-\000\000\000?,4?-\000\000\000\204?\001\000\000\000\000`-\000\000\000??\024l,\225??\000`-\000\000\000l\031\000\n-\000\0003\000`??\000\000\000?3 l,/?Dl,\036??\000`-\000\000\000\000\000\000\000??9$$?$\024\b"... cidname = '\0' <repeats 79 times> privdb_val = 0 calldurationlimit = 0 timelimit = 0 play_warning = 0 warning_freq = 0 warning_sound = 0x0 end_sound = 0x0 start_sound = 0x0 dtmfcalled = 0x0 dtmfcalling = 0x0 status = "ANSWER\000R\000GS", '\0' <repeats 244 times> play_to_caller = 0 play_to_callee = 0 sentringing = 0 moh = 0 outbound_group = 0x0 result = 0 start_time = 1190822755 privintro = "@g,?000?\003\000\000\000}\020?f,Pf,y?\223\037?\017\000\000\000??@?r?024\br?024\b\002\000\000\000??234l,xl,_\234l,p?024\b\002\000\000\000??\003(\025\b\003(\025\b\002\000\000\000??001(\025\b\002\000\000\000Pl,_?,\001(\025\b\002", '\0' <repeats 11 times>, "?202>\000\024l,\035\000\000\000\000\000\000\000\000\200l,0m,\000\000\000\000E\003\000\000\000\000\000\000\000\020\000\000\b\000\000\000\000\000\000\000c\203F\000"... privcid = '\0' <repeats 18 times>, " s*\000\000\000\000\000\000\000Pe,??e,c???f,?000?\003\000\000\000}\020?e,\001[?-j,\236f,\002", '\0' <repeats 19 times>, "\030\000\000S??Z?m\031\000\n\001\000\000\000???\206?\206\001\000\000\000??\024\b\000\000\000\000?,_?\206\000\000\000\000\030\000\000\000\214j,7\203F\n\b\000\000\000\000\000\000\000\000\000\000?202>\000\201\000\000\001", '\0' <repeats 23 times>, "E\003\000\000\000\000"... parse = 0xb62c5fe0 "IAX2" opermode = 0 args = {argc = 1, argv = 0xb62c6510, peers = 0xb62c5fe0 "IAX2", timeout = 0x0, options = 0x0, url = 0x0} opts = {flags = 0} opt_args = {0x814b0c4 "%s", 0xb62c69cc ",$\024\b$\024\b\030j,3\234\020\b?", 0x0, 0x0, 0x0, 0x1 <Address 0x1 out of bounds>, 0xb62c69bc "}\202\020\b", 0xb62c6590 "m\031", 0xb7debe63 "\207?211?201"} __PRETTY_FUNCTION__ = "dial_exec_full" ASTERISK-2 0xb6aae77c in dial_exec (chan=0xa002da0, data=0xb62c8ff8) at app_dial.c:1705 peerflags = {flags = 0} ASTERISK-3 0x080c45ee in pbx_exec (c=0xa002da0, app=0x81b2da8, data=0xb62c8ff8) at pbx.c:532 res = 0 saved_c_appl = 0x0 saved_c_data = 0x0 ASTERISK-4 0x080c82fc in pbx_extension_helper (c=0xa002da0, con=0x0, context=0xa002fc8 "default", exten=0xa003018 "6718", priority=3, label=0x0, callerid=0x9674b78 "04321902549", action=E_SPAWN) at pbx.c:1833 e = (struct ast_exten *) 0x82abd70 app = (struct ast_app *) 0x81b2da8 res = 8195840 q = {incstack = {0x81e61b4 "default", 0x821964c "to-gateway", 0x82a4064 "systemalarm", 0x82a4874 "test", 0x82a4ba4 "cluster-watchdog", 0x81fcaf4 "to-internal-nobody", 0x82783dc "to-conferences", 0x0 <repeats 121 times>}, stacklen = 7, status = 5, swo = 0x0, data = 0x0, foundcontext = 0x81e64f6 "to-internal-users"} passdata = "IAX2/iaxmodem01/718", '\0' <repeats 8172 times> matching_action = 0 __PRETTY_FUNCTION__ = "pbx_extension_helper" ASTERISK-5 0x080c96dc in ast_spawn_extension (c=0xa002da0, context=0xa002fc8 "default", exten=0xa003018 "6718", priority=3, callerid=0x9674b78 "04321902549") at pbx.c:2288 No locals. ASTERISK-6 0x080c9bac in __ast_pbx_run (c=0xa002da0) at pbx.c:2388 dst_exten = "\034\000\000\000\001\000\000\000?\b7\000\n\t\000\000\000R?025\b?\025\b?237\024?023\b\000\000\000\000\001\000\000\000?237\024?023\bl?023\b\b,?t\020\b?\025\bG\000\000\000\004?025\b#?025\bh*\027\bG\000\000\000\004?025\b9,\024?023\b8 ,}\202\020\b\000\000\000\000\027 H,}\202\020\b\000\000\000\000$Z?X,,\024?023\bl?023\bx,3\234\020\b?\000\n\000\000\000\000???\202\002\000\000?001\000\000\023?025\b?H\000\000\000??... pos = 0 digit = 0 found = 1 res = 0 autoloopflag = 0 error = 0 __PRETTY_FUNCTION__ = "__ast_pbx_run" ASTERISK-7 0x080ca9c9 in pbx_thread (data=0xa002da0) at pbx.c:2603 c = (struct ast_channel *) 0xa002da0 ASTERISK-8 0x08109f7c in dummy_start (data=0x8967be0) at utils.c:775 _buffer = {__routine = 0x8069860 <ast_unregister_thread>, __arg = 0xb62cbbb0, __canceltype = -1208157023, __prev = 0x0} ret = (void *) 0xb7e5b360 a = {start_routine = 0x80ca9b2 <pbx_thread>, data = 0xa002da0, name = 0xa000e18 "pbx_thread", ' ' <repeats 11 times>, "started at [ 2627] pbx.c ast_pbx_start()"} lock_info = (struct thr_lock_info *) 0xa003708 __PRETTY_FUNCTION__ = "dummy_start" ASTERISK-9 0xb7fd0240 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 No symbol table info available. ASTERISK-10 0xb7dfb4ae in clone () from /lib/tls/i686/cmov/libc.so.6 No symbol table info available. | ||
Comments: | By: Peter Kozak (spag) 2007-09-28 07:49:10 Sorry, the debian distribution is not Sarge, but Etch! By: Russell Bryant (russell) 2007-10-10 12:00:40 I would be interested in ssh access so that I can look at the core dump with gdb to see if I can determine more about what is happening. Feel free to contact me at russell@digium.com. By: pkempgen (pkempgen) 2007-10-15 15:04:23 I suggest we close this issue because there were no segfaults which look like this one since we moved the system from Dell + Debian to a SLES 10 machine. Sorry for not being able to provide any other core dumps. The Dell PowerEdge 2950 gave us a lot of trouble. Should a note be added to http://www.digium.com/en/docs/misc/compatibility_notes.php even if the problem does not seem to be related to the TE220b? (kept crashing even without the card) By: Volnikov Ivan (ivan) 2007-10-25 02:06:52 Yesterday I have seen precisely same crash (see attached ivan_bt_full_10840.log) in our Asterisk 1.4.11 (with some my patches) - OS: Fedora Core 6.0, CPU: Intel Pentium 4 3.0GHz (Multitheading), RAM: 2G. By: Digium Subversion (svnbot) 2007-11-01 14:29:04 Repository: asterisk Revision: 88153 U team/russell/readq-1.4/main/channel.c ------------------------------------------------------------------------ r88153 | russell | 2007-11-01 14:29:02 -0500 (Thu, 01 Nov 2007) | 15 lines The readq handling in ast_do_masquerade() got broken when the code was converted to use the AST_LIST macros. Furthermore, the actual operation performed was extremely bizarre. I have re-written the readq handling in ast_do_masquerade() to make it safe so that the readq list does not get corrupted, as well as simplified and documented the code. There is also another fix for list handling for channel datastores. (related to issues ASTERISK-10489, ASTERISK-10193, ASTERISK-10012, and the 2nd backtrace of ASTERISK-10616) (potentially related to issues ASTERISK-9737 and ASTERISK-10404) For users involved with any of the bug reports I have listed, please give this code a try: $ svn co http://svn.digium.com/svn/asterisk/team/russell/readq-1.4 ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-05 14:10:22.000-0600 Repository: asterisk Revision: 88709 U branches/1.4/main/channel.c ------------------------------------------------------------------------ r88709 | russell | 2007-11-05 14:10:17 -0600 (Mon, 05 Nov 2007) | 20 lines Merge the last bit of changes from asterisk/team/russell/readq-1.4 The issue here is that the channel frame readq handling got broken when the code was converted to use the linked list macros. It caused corruption of the list head and tail pointers. So, I fixed up the usage of the linked list macros and in passing, simplified the code. I also documented what the code is doing, as it was a bit difficult to figure out at first. This bug showed itself with crashes showing messed up head/tail pointers for the readq. However, there are a couple of crashes that aren't quite as obvious, but I think may be related. So, if your bug gets closed by this commit, but you still have a problem, please reopen or create a new bug report. (closes issue ASTERISK-10489) (closes issue ASTERISK-10193) (closes issue ASTERISK-10012) (closes issue ASTERISK-10616) (closes issue ASTERISK-9737) (closes issue ASTERISK-10404) ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-11-05 14:12:57.000-0600 Repository: asterisk Revision: 88710 _U trunk/ U trunk/main/channel.c ------------------------------------------------------------------------ r88710 | russell | 2007-11-05 14:12:56 -0600 (Mon, 05 Nov 2007) | 28 lines Merged revisions 88709 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r88709 | russell | 2007-11-05 14:11:04 -0600 (Mon, 05 Nov 2007) | 20 lines Merge the last bit of changes from asterisk/team/russell/readq-1.4 The issue here is that the channel frame readq handling got broken when the code was converted to use the linked list macros. It caused corruption of the list head and tail pointers. So, I fixed up the usage of the linked list macros and in passing, simplified the code. I also documented what the code is doing, as it was a bit difficult to figure out at first. This bug showed itself with crashes showing messed up head/tail pointers for the readq. However, there are a couple of crashes that aren't quite as obvious, but I think may be related. So, if your bug gets closed by this commit, but you still have a problem, please reopen or create a new bug report. (closes issue ASTERISK-10489) (closes issue ASTERISK-10193) (closes issue ASTERISK-10012) (closes issue ASTERISK-10616) (closes issue ASTERISK-9737) (closes issue ASTERISK-10404) ........ ------------------------------------------------------------------------ |