Summary: | ASTERISK-10339: Crash with "Segmentation fault" in zap internal timers processing | ||
Reporter: | Volnikov Ivan (ivan) | Labels: | |
Date Opened: | 2007-09-20 01:32:01 | Date Closed: | 2008-01-29 10:57:18.000-0600 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Channels/chan_zap |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) ast_1_4_11_svn_patch_channel_rc.diff | |
Description: | Seems like "race condition" effect in channel.c (2255) - no more "chan" object is accessible in then method generator_force at line: 1847 generate = chan->generator->generate; ****** ADDITIONAL INFORMATION ****** Decoded core dump: Program terminated with signal 11, Segmentation fault. #0 0x08084d9c in generator_force (data=0xb7ccee20) at channel.c:1847 1847 generate = chan->generator->generate; #0 0x08084d9c in generator_force (data=0xb7ccee20) at channel.c:1847 tmp = (void *) 0x0 res = 14442484 generate = (int (*)(struct ast_channel *, void *, int, int)) 0 chan = (struct ast_channel *) 0xb7ccee20 __PRETTY_FUNCTION__ = "generator_force" #1 0x080862f7 in __ast_read (chan=0xb7ccee20, dropaudio=0) at channel.c:2255 func = (int (*)(void *)) 0x8084d77 <generator_force> data = (void *) 0xb7ccee20 res = 0 f = (struct ast_frame *) 0x0 blah = 15 prestate = 6 __PRETTY_FUNCTION__ = "__ast_read" #2 0x08087951 in ast_read (chan=0xb7ccee20) at channel.c:2549 No locals. #3 0x080744b5 in autoservice_run (ign=0x0) at autoservice.c:87 f = (struct ast_frame *) 0x81744c0 chan = (struct ast_channel *) 0xb7ccee20 as = (struct asent *) 0x0 ms = 481 mons = {0xb7ccee20, 0xb7ceeb98, 0x11fe300, 0x5c7f2f, 0x11fdf60, 0xb7d5a404, 0x11fdf84, 0xa60024, 0x11fe088, 0x11fdf58, 0x80, 0x806d0b1, 0x10000000, 0xb7ed2ee8, 0x400000, 0x0, 0x2f504953, 0x36303532, 0x11fe300, 0x5c7f2f, 0x11fdfa0, 0xb7d5a404, 0x11fdfc4, 0x5dd168, 0x2f504953, 0x31303232, 0x11fe300, 0x5c7f2f, 0x11fdfc0, 0xb7d5a404, 0x11fdfe4, 0x5dd168, 0x2f504953, 0x35313632, 0x11fe300, 0x5c7f2f, 0x11fdfe0, 0xb7d5a404, 0x11fe004, 0x5dd168, 0x2f504953, 0x33303532, 0x11fe300, 0x5c7f2f, 0x11fe000, 0xb7d5a404, 0x806d0b1, 0x10000000, 0x2f504953, 0x400000, 0x0 <repeats 31 times>, 0xb6fff4, 0x17, 0x400000, 0x11fe1a8, 0xa5fcd3, 0x17, 0x11fe110, 0x11fe084, 0x806d0b1, 0x400000, 0x0, 0x2f504953, 0x36303532, 0x11fe300, 0x5c7f2f, 0x11fdfa0, 0xb7d5a404, 0xb57140, 0x5dd168, 0xb57140, 0xb71120, 0xb1, 0xb71144, 0xb71144, 0x0, 0xb71150, 0x18, 0x2f504953, 0x35313632, 0x11fe300, 0x5c7f2f, 0xa9b20b, 0xb6fff4, 0xb71120, 0x81441ce, 0x11fe1ac, 0xa9d6f3, 0x11fe300, 0x5c7f2f, 0x11fe000, 0xb7d5a404, 0x10000000, 0xb7ed2ee8, 0x806d0b1, 0x400000, 0x8ca4760, 0x11fe197, 0xb71358, 0x0, 0xdb92f1, 0xb71150, 0x0, 0x567c, 0x259, 0x0, 0xdba3dc, 0x4, 0x0, 0x1, 0x0, 0x11feb90, 0x8146b14, 0x8146c06, 0x11fe168, 0xdba7d0, 0x11fe1c8, 0x8073293, 0x816ff20, 0x18, 0x0, 0xb71150, 0x8fedca8, 0x41, 0x0, 0x0, 0x0, 0x10000000, 0x11fe1e4, 0x3d0f00, 0x9, 0x8146c05, 0x8146b13, 0x11fe1c8, 0xa9eece, 0x0, 0xffffffff, 0xdc5ff4, 0x8144134, 0x81441ce, 0x11fe208, 0x8073586, 0x8146b13, 0x7b, 0x8146d9c, 0x8146b2d, 0x816ff20, 0xa9d1b1, 0xdb92f1, 0x11fe4a4, 0xe8cf58ab, 0x1b8, 0x8fedcb0, 0x8fedd18, 0x8fedf00, 0x1b8, 0x11fe248, 0x807338d, 0x1e8, 0x1, 0xdb92f1, 0x96, 0x81663fc, 0x567c, 0xc0, 0xdbbe1a, 0xdba3dc, 0x7b, 0xdb92f1, 0x1, 0x11feb90, 0x11feb90, 0x8146b14, 0x8146c06, 0x11fe258, 0xdba7d0, 0x11fe2b8, 0x8073293, 0x816ff20, 0x11feb90, 0x8144134, 0x81441ce, 0x11fe278, 0x18, 0x11fe2a8, 0x810f9a9, 0xdba3dc, 0x1e8, 0x11fe2a8, 0x1, 0x9...} x = 1 __PRETTY_FUNCTION__ = "autoservice_run" #4 0x0810fd37 in dummy_start (data=0xb7d72e60) at utils.c:775 __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {14442484, 0, 18869136, 18867128, 522284565, 372142355}, __mask_was_saved = 0}}, __pad = {0x11fe470, 0x0, 0x80000000, 0x80000000}} __cancel_routine = (void (*)(void *)) 0x806a755 <ast_unregister_thread> __cancel_arg = (void *) 0x11feb90 not_first_call = 0 ret = (void *) 0x42978608 a = {start_routine = 0x80743a1 <autoservice_run>, data = 0x0, name = 0xb7d91e30 "autoservice_run started at [ 114] autoservice.c ast_autoservice_start()"} ---Type <return> to continue, or q <return> to quit--- lock_info = (struct thr_lock_info *) 0x8fedd18 __PRETTY_FUNCTION__ = "dummy_start" ASTERISK-1 0x00db73db in start_thread () from /lib/libpthread.so.0 No symbol table info available. ASTERISK-2 0x00b0426e in clone () from /lib/libc.so.6 No symbol table info available. | ||
Comments: | By: Volnikov Ivan (ivan) 2007-09-20 01:58:42 I think that in <channel.c> code: 2251 /* save a copy of func/data before unlocking the channel */ 2252 int (*func)(void *) = chan->timingfunc; 2253 void *data = chan->timingdata; 2254 ast_channel_unlock(chan); 2255 func(data); Actually call this <channel.c>: 1838static int generator_force(void *data) 1839{ 1840 /* Called if generator doesn't have data */ 1841 void *tmp; 1842 int res; 1843 int (*generate)(struct ast_channel *chan, void *tmp, int datalen, int samples); 1844 struct ast_channel *chan = data; 1845 tmp = chan->generatordata; 1846 chan->generatordata = NULL; 1847 generate = chan->generator->generate; 1848 res = generate(chan, tmp, 0, 160); 1849 chan->generatordata = tmp; 1850 if (res) { 1851 if (option_debug) 1852 ast_log(LOG_DEBUG, "Auto-deactivating generator\n"); 1853 ast_deactivate_generator(chan); 1854 } 1855 return 0; 1856} If to consider that set the timer make like "ast_settimeout(chan, 160, generator_force, chan);". We come to conclusion <channel.c>: 2251 /* save a copy of func/data before unlocking the channel */ 2252 int (*func)(void *) = chan->timingfunc; 2253 void *data = chan->timingdata; 2254 ast_channel_unlock(chan); 2255 func(data); must be replaced on that: 2251 /* save a copy of func/data before unlocking the channel */ 2252 int (*func)(void *) = chan->timingfunc; 2253 void *data = chan->timingdata; 2254 func(data); 2255 ast_channel_unlock(chan); Or other decision which will allow to guarantee safety of object "chan" (Probably this call is born from critical section owing to fear of deadlock) By: Volnikov Ivan (ivan) 2007-09-21 04:51:17 We found some reproducibility using Asterisk transfer mode (by#) While make Local channel to be redirected point is BUSY. By: Volnikov Ivan (ivan) 2007-09-25 08:13:57 I found 100% conditions for reproducibility. Test system: 1. Asterisk without addons 2. Internal timers on zaptel driver 3. Attended Transfer on # symbol 4. MixMonitor on all connection Actions: 2 SIP Terminal connected. One of there mark # (Attended Transfer) and dial the number of third SIP terminal that is BUSY now. After 3-10 attempts the crash happends. By: Volnikov Ivan (ivan) 2007-09-25 08:38:54 I place patch that solves the problem. I have not heard any distinct answer to a question from my Note 1 about position of ast_channel_unlock(chan) <channel.c>. This patch have set on our office system for testing. I shell inform about it. If there are remarks and offers shall be glad them to discuss. By: Volnikov Ivan (ivan) 2007-09-26 08:58:59 After patch I have seen another type of crash (see a Note for http://bugs.digium.com/view.php?id=10571). It is possibly connected with changes of parameters critical section using. By: Volnikov Ivan (ivan) 2007-10-01 02:16:15 Now 5 days uptime happend without crash. By: Volnikov Ivan (ivan) 2007-10-03 01:03:36 Now 7 days uptime happend without crash. By: Volnikov Ivan (ivan) 2007-10-08 03:02:04 Now 12 days uptime happend without crash. By: Digium Subversion (svnbot) 2007-10-18 12:42:23 Repository: asterisk Revision: 86330 U branches/1.4/include/asterisk/channel.h U branches/1.4/main/channel.c ------------------------------------------------------------------------ r86330 | russell | 2007-10-18 12:42:22 -0500 (Thu, 18 Oct 2007) | 10 lines The channel needs to stay locked while running timer callbacks, as they access and modify channel data that may change elsewhere. I went through every timer callback in the source tree to make sure that none of them did any additional locking that could introduce deadlocks, and all is well. (closes issue ASTERISK-10339) Reported by: Ivan Patches: ast_1_4_11_svn_patch_channel_rc.diff uploaded by Ivan (license 229) ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2007-10-18 12:46:03 Repository: asterisk Revision: 86331 _U trunk/ U trunk/include/asterisk/channel.h U trunk/main/channel.c ------------------------------------------------------------------------ r86331 | russell | 2007-10-18 12:46:03 -0500 (Thu, 18 Oct 2007) | 18 lines Merged revisions 86330 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r86330 | russell | 2007-10-18 13:03:10 -0500 (Thu, 18 Oct 2007) | 10 lines The channel needs to stay locked while running timer callbacks, as they access and modify channel data that may change elsewhere. I went through every timer callback in the source tree to make sure that none of them did any additional locking that could introduce deadlocks, and all is well. (closes issue ASTERISK-10339) Reported by: Ivan Patches: ast_1_4_11_svn_patch_channel_rc.diff uploaded by Ivan (license 229) ........ ------------------------------------------------------------------------ By: Digium Subversion (svnbot) 2008-01-28 11:13:11.000-0600 Repository: asterisk Revision: 100581 U branches/1.4/channels/chan_local.c U branches/1.4/include/asterisk/channel.h U branches/1.4/main/channel.c ------------------------------------------------------------------------ r100581 | russell | 2008-01-28 11:13:10 -0600 (Mon, 28 Jan 2008) | 9 lines Make some deadlock related fixes. These bugs were discovered and reported internally at Digium by Steve Pitts. - Fix up chan_local to ensure that the channel lock is held before the local pvt lock. - Don't hold the channel lock when executing the timing function, as it can cause a deadlock when using chan_local. This actually changes the code back to be how it was before the change for issue ASTERISK-10339. But, I added some other locking that I think will prevent the problem reported there, as well. ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=100581 By: Digium Subversion (svnbot) 2008-01-28 11:18:36.000-0600 Repository: asterisk Revision: 100582 _U trunk/ U trunk/channels/chan_local.c U trunk/include/asterisk/channel.h U trunk/main/channel.c ------------------------------------------------------------------------ r100582 | russell | 2008-01-28 11:18:35 -0600 (Mon, 28 Jan 2008) | 17 lines Merged revisions 100581 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100581 | russell | 2008-01-28 11:15:41 -0600 (Mon, 28 Jan 2008) | 9 lines Make some deadlock related fixes. These bugs were discovered and reported internally at Digium by Steve Pitts. - Fix up chan_local to ensure that the channel lock is held before the local pvt lock. - Don't hold the channel lock when executing the timing function, as it can cause a deadlock when using chan_local. This actually changes the code back to be how it was before the change for issue ASTERISK-10339. But, I added some other locking that I think will prevent the problem reported there, as well. ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=100582 By: Digium Subversion (svnbot) 2008-01-29 10:57:18.000-0600 Repository: asterisk Revision: 100881 _U team/murf/bug11210/ U team/murf/bug11210/apps/app_voicemail.c U team/murf/bug11210/build_tools/menuselect-deps.in U team/murf/bug11210/channels/Makefile U team/murf/bug11210/channels/chan_h323.c U team/murf/bug11210/channels/chan_iax2.c U team/murf/bug11210/channels/chan_local.c U team/murf/bug11210/channels/chan_mgcp.c U team/murf/bug11210/channels/chan_misdn.c U team/murf/bug11210/channels/chan_sip.c A team/murf/bug11210/channels/chan_vpb.cc U team/murf/bug11210/channels/chan_zap.c A team/murf/bug11210/configs/vpb.conf.sample U team/murf/bug11210/configure U team/murf/bug11210/configure.ac U team/murf/bug11210/doc/tex/channelvariables.tex U team/murf/bug11210/include/asterisk/autoconfig.h.in U team/murf/bug11210/include/asterisk/channel.h U team/murf/bug11210/include/asterisk/sched.h U team/murf/bug11210/main/cdr.c U team/murf/bug11210/main/channel.c U team/murf/bug11210/main/dnsmgr.c U team/murf/bug11210/main/features.c U team/murf/bug11210/main/file.c U team/murf/bug11210/main/logger.c U team/murf/bug11210/main/pbx.c U team/murf/bug11210/main/rtp.c U team/murf/bug11210/makeopts.in U team/murf/bug11210/pbx/pbx_dundi.c ------------------------------------------------------------------------ r100881 | murf | 2008-01-29 10:57:16 -0600 (Tue, 29 Jan 2008) | 213 lines Merged revisions 100488,100497,100514,100532-100533,100549,100565,100582,100625,100627-100628,100630-100632,100671,100674,100676-100679 via svnmerge from https://origsvn.digium.com/svn/asterisk/trunk ................ r100488 | tilghman | 2008-01-27 15:35:29 -0700 (Sun, 27 Jan 2008) | 19 lines Merged revisions 100465 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100465 | tilghman | 2008-01-27 15:59:53 -0600 (Sun, 27 Jan 2008) | 11 lines When deleting a task from the scheduler, ignoring the return value could possibly cause memory to be accessed after it is freed, which causes all sorts of random memory corruption. Instead, if a deletion fails, wait a bit and try again (noting that another thread could change our taskid value). (closes issue ASTERISK-10897) Reported by: flujan Patches: 20080124__bug11386.diff.txt uploaded by Corydon76 (license 14) Tested by: Corydon76, flujan, stuarth` ........ ................ r100497 | tilghman | 2008-01-27 16:14:48 -0700 (Sun, 27 Jan 2008) | 5 lines With the switch to the ast_sched_replace* API in trunk, we lose the correction that was just merged from 1.4, so this is a changeover to those APIs to use the macro versions, so that we properly detect errors from ast_sched_del, instead of simply ignoring the return values. ................ r100514 | russell | 2008-01-27 17:56:14 -0700 (Sun, 27 Jan 2008) | 5 lines These readlocks always fail for me on my mac, and I saw it happen again today on another mac. We ignore the return value of locking operations almost everywhere in Asterisk. So, ignore these, as well, so Asterisk will actually work on systems where this is occurring while I look into what the issue is. ................ r100532 | russell | 2008-01-27 21:30:44 -0700 (Sun, 27 Jan 2008) | 3 lines - Simplify a line with ARRAY_LEN() - Make a few little formatting changes ................ r100533 | russell | 2008-01-27 21:43:14 -0700 (Sun, 27 Jan 2008) | 2 lines Make a couple more uses of ARRAY_LEN, and convert some spaces to tabs ................ r100549 | file | 2008-01-28 06:57:38 -0700 (Mon, 28 Jan 2008) | 4 lines Don't do a network byte order conversion when setting the socket's port variable to that of bindaddr's. It is already in the correct network byte order. (closes issue ASTERISK-11268) Reported by: hmodes ................ r100565 | russell | 2008-01-28 07:27:28 -0700 (Mon, 28 Jan 2008) | 2 lines Clean up some formatting, and simplify a bit of code using ast_str ................ r100582 | russell | 2008-01-28 10:21:24 -0700 (Mon, 28 Jan 2008) | 17 lines Merged revisions 100581 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100581 | russell | 2008-01-28 11:15:41 -0600 (Mon, 28 Jan 2008) | 9 lines Make some deadlock related fixes. These bugs were discovered and reported internally at Digium by Steve Pitts. - Fix up chan_local to ensure that the channel lock is held before the local pvt lock. - Don't hold the channel lock when executing the timing function, as it can cause a deadlock when using chan_local. This actually changes the code back to be how it was before the change for issue ASTERISK-10339. But, I added some other locking that I think will prevent the problem reported there, as well. ........ ................ r100625 | qwell | 2008-01-28 11:24:40 -0700 (Mon, 28 Jan 2008) | 9 lines Merged revisions 100624 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100624 | qwell | 2008-01-28 12:23:09 -0600 (Mon, 28 Jan 2008) | 1 line Correct a comment which made little/no sense. ........ ................ r100627 | russell | 2008-01-28 11:27:08 -0700 (Mon, 28 Jan 2008) | 15 lines Merged revisions 100626 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100626 | russell | 2008-01-28 12:26:31 -0600 (Mon, 28 Jan 2008) | 7 lines Fix a crash in ast_masq_park_call() (issue ASTERISK-10856) Reported by: DEA Patches: res_features-park.txt uploaded by DEA (license 3) ........ ................ r100628 | tilghman | 2008-01-28 11:27:29 -0700 (Mon, 28 Jan 2008) | 3 lines Normalize the detection for execinfo, so that Linux (glibc) and other platforms with libexecinfo will generate inline stack backtraces correctly. ................ r100630 | russell | 2008-01-28 11:38:56 -0700 (Mon, 28 Jan 2008) | 13 lines Merged revisions 100629 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100629 | russell | 2008-01-28 12:34:20 -0600 (Mon, 28 Jan 2008) | 5 lines For some reason, the use of this strdupa() is leading to memory corruption on freebsd sparc64. This trivial workaround fixes it. (closes issue ASTERISK-9956, closes issue ASTERISK-11316, reported by mattias04 and Home-of-the-Brave) ........ ................ r100631 | russell | 2008-01-28 11:41:23 -0700 (Mon, 28 Jan 2008) | 3 lines Merge rev 100626 from Asterisk 1.4. The svnmerge of this commit was a NoOp, since res_features doesn't exist in trunk. Thanks to qwell for pointing it out! ................ r100632 | file | 2008-01-28 12:04:53 -0700 (Mon, 28 Jan 2008) | 2 lines Fix up two scheduling issues. In one instance a scheduled item was not deleted when it should have been and in the other it was scheduled again when it shouldn't have been. ................ r100671 | file | 2008-01-28 13:40:08 -0700 (Mon, 28 Jan 2008) | 6 lines Fix up some T38 state change issues. (closes issue ASTERISK-11106) Reported by: dimas Patches: v2-sip-t38state.patch uploaded by dimas (license 88) ................ r100674 | mmichelson | 2008-01-28 13:58:12 -0700 (Mon, 28 Jan 2008) | 10 lines Blocked revisions 100673 via svnmerge ........ r100673 | mmichelson | 2008-01-28 14:55:56 -0600 (Mon, 28 Jan 2008) | 3 lines Undoing the deprecation of chan_vpb. It is alive and well. ........ ................ r100676 | qwell | 2008-01-28 14:02:11 -0700 (Mon, 28 Jan 2008) | 16 lines Merged revisions 100672 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 (closes issue ASTERISK-11263) ........ r100672 | qwell | 2008-01-28 14:42:43 -0600 (Mon, 28 Jan 2008) | 7 lines When using ODBC_STORAGE, make sure we put greeting files into the database like we do with the others. Issue ASTERISK-11263 Reported by: dimas Patches: vmgreet.patch uploaded by dimas (license 88) ........ ................ r100677 | tilghman | 2008-01-28 14:05:29 -0700 (Mon, 28 Jan 2008) | 10 lines Merged revisions 100675 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.4 ........ r100675 | tilghman | 2008-01-28 15:02:02 -0600 (Mon, 28 Jan 2008) | 2 lines WaitExten didn't handle AbsoluteTimeout properly (went to 't' instead of 'T') ........ ................ r100678 | mmichelson | 2008-01-28 14:07:18 -0700 (Mon, 28 Jan 2008) | 3 lines Re-inserting chan_vpb into trunk. ................ r100679 | qwell | 2008-01-28 14:11:24 -0700 (Mon, 28 Jan 2008) | 1 line Reintroduce more chan_vpb stuff that was removed in r100421 and r100422 ................ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=100881 |