[Home]

Summary:ASTERISK-10339: Crash with "Segmentation fault" in zap internal timers processing
Reporter:Volnikov Ivan (ivan)Labels:
Date Opened:2007-09-20 01:32:01Date Closed:2008-01-29 10:57:18.000-0600
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_zap
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast_1_4_11_svn_patch_channel_rc.diff
Description:Seems like "race condition" effect in channel.c (2255) - no more "chan" object is accessible in then method generator_force at line:
1847            generate = chan->generator->generate;

****** ADDITIONAL INFORMATION ******

Decoded core dump:
Program terminated with signal 11, Segmentation fault.
#0  0x08084d9c in generator_force (data=0xb7ccee20) at channel.c:1847
1847            generate = chan->generator->generate;
#0  0x08084d9c in generator_force (data=0xb7ccee20) at channel.c:1847
       tmp = (void *) 0x0
       res = 14442484
       generate = (int (*)(struct ast_channel *, void *, int, int)) 0
       chan = (struct ast_channel *) 0xb7ccee20
       __PRETTY_FUNCTION__ = "generator_force"
#1  0x080862f7 in __ast_read (chan=0xb7ccee20, dropaudio=0) at channel.c:2255
       func = (int (*)(void *)) 0x8084d77 <generator_force>
       data = (void *) 0xb7ccee20
       res = 0
       f = (struct ast_frame *) 0x0
       blah = 15
       prestate = 6
       __PRETTY_FUNCTION__ = "__ast_read"
#2  0x08087951 in ast_read (chan=0xb7ccee20) at channel.c:2549
No locals.
#3  0x080744b5 in autoservice_run (ign=0x0) at autoservice.c:87
       f = (struct ast_frame *) 0x81744c0
       chan = (struct ast_channel *) 0xb7ccee20
       as = (struct asent *) 0x0
       ms = 481
       mons = {0xb7ccee20, 0xb7ceeb98, 0x11fe300, 0x5c7f2f, 0x11fdf60, 0xb7d5a404, 0x11fdf84, 0xa60024, 0x11fe088,
 0x11fdf58, 0x80, 0x806d0b1, 0x10000000, 0xb7ed2ee8, 0x400000, 0x0, 0x2f504953, 0x36303532, 0x11fe300, 0x5c7f2f,
 0x11fdfa0, 0xb7d5a404, 0x11fdfc4, 0x5dd168, 0x2f504953, 0x31303232, 0x11fe300, 0x5c7f2f, 0x11fdfc0, 0xb7d5a404,
 0x11fdfe4, 0x5dd168, 0x2f504953, 0x35313632, 0x11fe300, 0x5c7f2f, 0x11fdfe0, 0xb7d5a404, 0x11fe004, 0x5dd168, 0x2f504953,
 0x33303532, 0x11fe300, 0x5c7f2f, 0x11fe000, 0xb7d5a404, 0x806d0b1, 0x10000000, 0x2f504953, 0x400000,
 0x0 <repeats 31 times>, 0xb6fff4, 0x17, 0x400000, 0x11fe1a8, 0xa5fcd3, 0x17, 0x11fe110, 0x11fe084, 0x806d0b1, 0x400000,
 0x0, 0x2f504953, 0x36303532, 0x11fe300, 0x5c7f2f, 0x11fdfa0, 0xb7d5a404, 0xb57140, 0x5dd168, 0xb57140, 0xb71120, 0xb1,
 0xb71144, 0xb71144, 0x0, 0xb71150, 0x18, 0x2f504953, 0x35313632, 0x11fe300, 0x5c7f2f, 0xa9b20b, 0xb6fff4, 0xb71120,
 0x81441ce, 0x11fe1ac, 0xa9d6f3, 0x11fe300, 0x5c7f2f, 0x11fe000, 0xb7d5a404, 0x10000000, 0xb7ed2ee8, 0x806d0b1, 0x400000,
 0x8ca4760, 0x11fe197, 0xb71358, 0x0, 0xdb92f1, 0xb71150, 0x0, 0x567c, 0x259, 0x0, 0xdba3dc, 0x4, 0x0, 0x1, 0x0,
 0x11feb90, 0x8146b14, 0x8146c06, 0x11fe168, 0xdba7d0, 0x11fe1c8, 0x8073293, 0x816ff20, 0x18, 0x0, 0xb71150, 0x8fedca8,
 0x41, 0x0, 0x0, 0x0, 0x10000000, 0x11fe1e4, 0x3d0f00, 0x9, 0x8146c05, 0x8146b13, 0x11fe1c8, 0xa9eece, 0x0, 0xffffffff,
 0xdc5ff4, 0x8144134, 0x81441ce, 0x11fe208, 0x8073586, 0x8146b13, 0x7b, 0x8146d9c, 0x8146b2d, 0x816ff20, 0xa9d1b1,
 0xdb92f1, 0x11fe4a4, 0xe8cf58ab, 0x1b8, 0x8fedcb0, 0x8fedd18, 0x8fedf00, 0x1b8, 0x11fe248, 0x807338d, 0x1e8, 0x1,
 0xdb92f1, 0x96, 0x81663fc, 0x567c, 0xc0, 0xdbbe1a, 0xdba3dc, 0x7b, 0xdb92f1, 0x1, 0x11feb90, 0x11feb90, 0x8146b14,
 0x8146c06, 0x11fe258, 0xdba7d0, 0x11fe2b8, 0x8073293, 0x816ff20, 0x11feb90, 0x8144134, 0x81441ce, 0x11fe278, 0x18,
 0x11fe2a8, 0x810f9a9, 0xdba3dc, 0x1e8, 0x11fe2a8, 0x1, 0x9...}
       x = 1
       __PRETTY_FUNCTION__ = "autoservice_run"
#4  0x0810fd37 in dummy_start (data=0xb7d72e60) at utils.c:775
       __cancel_buf = {__cancel_jmp_buf = {{__cancel_jmp_buf = {14442484, 0, 18869136, 18867128, 522284565, 372142355},
     __mask_was_saved = 0}}, __pad = {0x11fe470, 0x0, 0x80000000, 0x80000000}}
       __cancel_routine = (void (*)(void *)) 0x806a755 <ast_unregister_thread>
       __cancel_arg = (void *) 0x11feb90
       not_first_call = 0
       ret = (void *) 0x42978608
       a = {start_routine = 0x80743a1 <autoservice_run>, data = 0x0,
 name = 0xb7d91e30 "autoservice_run      started at [  114] autoservice.c ast_autoservice_start()"}
---Type <return> to continue, or q <return> to quit---
       lock_info = (struct thr_lock_info *) 0x8fedd18
       __PRETTY_FUNCTION__ = "dummy_start"
ASTERISK-1  0x00db73db in start_thread () from /lib/libpthread.so.0
No symbol table info available.
ASTERISK-2  0x00b0426e in clone () from /lib/libc.so.6
No symbol table info available.
Comments:By: Volnikov Ivan (ivan) 2007-09-20 01:58:42

I think that in <channel.c> code:

2251 /* save a copy of func/data before unlocking the channel */
2252 int (*func)(void *) = chan->timingfunc;
2253 void *data = chan->timingdata;
2254 ast_channel_unlock(chan);
2255 func(data);

Actually call this <channel.c>:

1838static int generator_force(void *data)
1839{
1840 /* Called if generator doesn't have data */
1841 void *tmp;
1842 int res;
1843 int (*generate)(struct ast_channel *chan, void *tmp, int datalen, int samples);
1844 struct ast_channel *chan = data;
1845 tmp = chan->generatordata;
1846 chan->generatordata = NULL;
1847 generate = chan->generator->generate;
1848 res = generate(chan, tmp, 0, 160);
1849 chan->generatordata = tmp;
1850 if (res) {
1851 if (option_debug)
1852 ast_log(LOG_DEBUG, "Auto-deactivating generator\n");
1853 ast_deactivate_generator(chan);
1854 }
1855 return 0;
1856}

If to consider that set the timer make like "ast_settimeout(chan, 160, generator_force, chan);". We come to conclusion <channel.c>:

2251 /* save a copy of func/data before unlocking the channel */
2252 int (*func)(void *) = chan->timingfunc;
2253 void *data = chan->timingdata;
2254 ast_channel_unlock(chan);
2255 func(data);

must be replaced on that:

2251 /* save a copy of func/data before unlocking the channel */
2252 int (*func)(void *) = chan->timingfunc;
2253 void *data = chan->timingdata;
2254 func(data);
2255 ast_channel_unlock(chan);

Or other decision which will allow to guarantee safety of object "chan"
(Probably this call is born from critical section owing to fear of deadlock)



By: Volnikov Ivan (ivan) 2007-09-21 04:51:17

We found some reproducibility using Asterisk transfer mode (by#)
While make Local channel to be redirected point is BUSY.



By: Volnikov Ivan (ivan) 2007-09-25 08:13:57

I found 100% conditions for reproducibility.
Test system:
1. Asterisk without addons
2. Internal timers on zaptel driver
3. Attended Transfer on # symbol
4. MixMonitor on all connection
Actions:
2 SIP Terminal connected. One of there mark # (Attended Transfer) and dial the number of third SIP terminal that is BUSY now. After 3-10 attempts the crash happends.

By: Volnikov Ivan (ivan) 2007-09-25 08:38:54

I place patch that solves the problem. I have not heard any distinct answer to a question from my Note 1 about position of ast_channel_unlock(chan) <channel.c>. This patch have set on our office system for testing. I shell inform about it. If there are remarks and offers shall be glad them to discuss.

By: Volnikov Ivan (ivan) 2007-09-26 08:58:59

After patch I have seen another type of crash (see a Note for http://bugs.digium.com/view.php?id=10571). It is possibly connected with changes of parameters critical section using.

By: Volnikov Ivan (ivan) 2007-10-01 02:16:15

Now 5 days uptime happend without crash.

By: Volnikov Ivan (ivan) 2007-10-03 01:03:36

Now 7 days uptime happend without crash.

By: Volnikov Ivan (ivan) 2007-10-08 03:02:04

Now 12 days uptime happend without crash.

By: Digium Subversion (svnbot) 2007-10-18 12:42:23

Repository: asterisk
Revision: 86330

U   branches/1.4/include/asterisk/channel.h
U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r86330 | russell | 2007-10-18 12:42:22 -0500 (Thu, 18 Oct 2007) | 10 lines

The channel needs to stay locked while running timer callbacks, as they access
and modify channel data that may change elsewhere.  I went through every timer
callback in the source tree to make sure that none of them did any additional
locking that could introduce deadlocks, and all is well.

(closes issue ASTERISK-10339)
Reported by: Ivan
Patches:
     ast_1_4_11_svn_patch_channel_rc.diff uploaded by Ivan (license 229)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-10-18 12:46:03

Repository: asterisk
Revision: 86331

_U  trunk/
U   trunk/include/asterisk/channel.h
U   trunk/main/channel.c

------------------------------------------------------------------------
r86331 | russell | 2007-10-18 12:46:03 -0500 (Thu, 18 Oct 2007) | 18 lines

Merged revisions 86330 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r86330 | russell | 2007-10-18 13:03:10 -0500 (Thu, 18 Oct 2007) | 10 lines

The channel needs to stay locked while running timer callbacks, as they access
and modify channel data that may change elsewhere.  I went through every timer
callback in the source tree to make sure that none of them did any additional
locking that could introduce deadlocks, and all is well.

(closes issue ASTERISK-10339)
Reported by: Ivan
Patches:
     ast_1_4_11_svn_patch_channel_rc.diff uploaded by Ivan (license 229)

........

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2008-01-28 11:13:11.000-0600

Repository: asterisk
Revision: 100581

U   branches/1.4/channels/chan_local.c
U   branches/1.4/include/asterisk/channel.h
U   branches/1.4/main/channel.c

------------------------------------------------------------------------
r100581 | russell | 2008-01-28 11:13:10 -0600 (Mon, 28 Jan 2008) | 9 lines

Make some deadlock related fixes.  These bugs were discovered and reported
internally at Digium by Steve Pitts.
- Fix up chan_local to ensure that the channel lock is held before the local
  pvt lock.
- Don't hold the channel lock when executing the timing function, as it can
  cause a deadlock when using chan_local.  This actually changes the code back
  to be how it was before the change for issue ASTERISK-10339.  But, I added some other
  locking that I think will prevent the problem reported there, as well.

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=100581

By: Digium Subversion (svnbot) 2008-01-28 11:18:36.000-0600

Repository: asterisk
Revision: 100582

_U  trunk/
U   trunk/channels/chan_local.c
U   trunk/include/asterisk/channel.h
U   trunk/main/channel.c

------------------------------------------------------------------------
r100582 | russell | 2008-01-28 11:18:35 -0600 (Mon, 28 Jan 2008) | 17 lines

Merged revisions 100581 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100581 | russell | 2008-01-28 11:15:41 -0600 (Mon, 28 Jan 2008) | 9 lines

Make some deadlock related fixes.  These bugs were discovered and reported
internally at Digium by Steve Pitts.
- Fix up chan_local to ensure that the channel lock is held before the local
  pvt lock.
- Don't hold the channel lock when executing the timing function, as it can
  cause a deadlock when using chan_local.  This actually changes the code back
  to be how it was before the change for issue ASTERISK-10339.  But, I added some other
  locking that I think will prevent the problem reported there, as well.

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=100582

By: Digium Subversion (svnbot) 2008-01-29 10:57:18.000-0600

Repository: asterisk
Revision: 100881

_U  team/murf/bug11210/
U   team/murf/bug11210/apps/app_voicemail.c
U   team/murf/bug11210/build_tools/menuselect-deps.in
U   team/murf/bug11210/channels/Makefile
U   team/murf/bug11210/channels/chan_h323.c
U   team/murf/bug11210/channels/chan_iax2.c
U   team/murf/bug11210/channels/chan_local.c
U   team/murf/bug11210/channels/chan_mgcp.c
U   team/murf/bug11210/channels/chan_misdn.c
U   team/murf/bug11210/channels/chan_sip.c
A   team/murf/bug11210/channels/chan_vpb.cc
U   team/murf/bug11210/channels/chan_zap.c
A   team/murf/bug11210/configs/vpb.conf.sample
U   team/murf/bug11210/configure
U   team/murf/bug11210/configure.ac
U   team/murf/bug11210/doc/tex/channelvariables.tex
U   team/murf/bug11210/include/asterisk/autoconfig.h.in
U   team/murf/bug11210/include/asterisk/channel.h
U   team/murf/bug11210/include/asterisk/sched.h
U   team/murf/bug11210/main/cdr.c
U   team/murf/bug11210/main/channel.c
U   team/murf/bug11210/main/dnsmgr.c
U   team/murf/bug11210/main/features.c
U   team/murf/bug11210/main/file.c
U   team/murf/bug11210/main/logger.c
U   team/murf/bug11210/main/pbx.c
U   team/murf/bug11210/main/rtp.c
U   team/murf/bug11210/makeopts.in
U   team/murf/bug11210/pbx/pbx_dundi.c

------------------------------------------------------------------------
r100881 | murf | 2008-01-29 10:57:16 -0600 (Tue, 29 Jan 2008) | 213 lines

Merged revisions 100488,100497,100514,100532-100533,100549,100565,100582,100625,100627-100628,100630-100632,100671,100674,100676-100679 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r100488 | tilghman | 2008-01-27 15:35:29 -0700 (Sun, 27 Jan 2008) | 19 lines

Merged revisions 100465 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100465 | tilghman | 2008-01-27 15:59:53 -0600 (Sun, 27 Jan 2008) | 11 lines

When deleting a task from the scheduler, ignoring the return value could
possibly cause memory to be accessed after it is freed, which causes all
sorts of random memory corruption.  Instead, if a deletion fails, wait a
bit and try again (noting that another thread could change our taskid
value).
(closes issue ASTERISK-10897)
Reported by: flujan
Patches:
      20080124__bug11386.diff.txt uploaded by Corydon76 (license 14)
Tested by: Corydon76, flujan, stuarth`

........

................
r100497 | tilghman | 2008-01-27 16:14:48 -0700 (Sun, 27 Jan 2008) | 5 lines

With the switch to the ast_sched_replace* API in trunk, we lose the correction
that was just merged from 1.4, so this is a changeover to those APIs to use the
macro versions, so that we properly detect errors from ast_sched_del, instead
of simply ignoring the return values.

................
r100514 | russell | 2008-01-27 17:56:14 -0700 (Sun, 27 Jan 2008) | 5 lines

These readlocks always fail for me on my mac, and I saw it happen again
today on another mac.  We ignore the return value of locking operations almost
everywhere in Asterisk.  So, ignore these, as well, so Asterisk will actually
work on systems where this is occurring while I look into what the issue is.

................
r100532 | russell | 2008-01-27 21:30:44 -0700 (Sun, 27 Jan 2008) | 3 lines

- Simplify a line with ARRAY_LEN()
- Make a few little formatting changes

................
r100533 | russell | 2008-01-27 21:43:14 -0700 (Sun, 27 Jan 2008) | 2 lines

Make a couple more uses of ARRAY_LEN, and convert some spaces to tabs

................
r100549 | file | 2008-01-28 06:57:38 -0700 (Mon, 28 Jan 2008) | 4 lines

Don't do a network byte order conversion when setting the socket's port variable to that of bindaddr's. It is already in the correct network byte order.
(closes issue ASTERISK-11268)
Reported by: hmodes

................
r100565 | russell | 2008-01-28 07:27:28 -0700 (Mon, 28 Jan 2008) | 2 lines

Clean up some formatting, and simplify a bit of code using ast_str

................
r100582 | russell | 2008-01-28 10:21:24 -0700 (Mon, 28 Jan 2008) | 17 lines

Merged revisions 100581 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100581 | russell | 2008-01-28 11:15:41 -0600 (Mon, 28 Jan 2008) | 9 lines

Make some deadlock related fixes.  These bugs were discovered and reported
internally at Digium by Steve Pitts.
- Fix up chan_local to ensure that the channel lock is held before the local
  pvt lock.
- Don't hold the channel lock when executing the timing function, as it can
  cause a deadlock when using chan_local.  This actually changes the code back
  to be how it was before the change for issue ASTERISK-10339.  But, I added some other
  locking that I think will prevent the problem reported there, as well.

........

................
r100625 | qwell | 2008-01-28 11:24:40 -0700 (Mon, 28 Jan 2008) | 9 lines

Merged revisions 100624 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100624 | qwell | 2008-01-28 12:23:09 -0600 (Mon, 28 Jan 2008) | 1 line

Correct a comment which made little/no sense.
........

................
r100627 | russell | 2008-01-28 11:27:08 -0700 (Mon, 28 Jan 2008) | 15 lines

Merged revisions 100626 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100626 | russell | 2008-01-28 12:26:31 -0600 (Mon, 28 Jan 2008) | 7 lines

Fix a crash in ast_masq_park_call()

(issue ASTERISK-10856)
Reported by: DEA
Patches:
     res_features-park.txt uploaded by DEA (license 3)

........

................
r100628 | tilghman | 2008-01-28 11:27:29 -0700 (Mon, 28 Jan 2008) | 3 lines

Normalize the detection for execinfo, so that Linux (glibc) and other platforms
with libexecinfo will generate inline stack backtraces correctly.

................
r100630 | russell | 2008-01-28 11:38:56 -0700 (Mon, 28 Jan 2008) | 13 lines

Merged revisions 100629 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100629 | russell | 2008-01-28 12:34:20 -0600 (Mon, 28 Jan 2008) | 5 lines

For some reason, the use of this strdupa() is leading to memory corruption on
freebsd sparc64.  This trivial workaround fixes it.

(closes issue ASTERISK-9956, closes issue ASTERISK-11316, reported by mattias04 and Home-of-the-Brave)

........

................
r100631 | russell | 2008-01-28 11:41:23 -0700 (Mon, 28 Jan 2008) | 3 lines

Merge rev 100626 from Asterisk 1.4.  The svnmerge of this commit was a NoOp, since
res_features doesn't exist in trunk.  Thanks to qwell for pointing it out!

................
r100632 | file | 2008-01-28 12:04:53 -0700 (Mon, 28 Jan 2008) | 2 lines

Fix up two scheduling issues. In one instance a scheduled item was not deleted when it should have been and in the other it was scheduled again when it shouldn't have been.

................
r100671 | file | 2008-01-28 13:40:08 -0700 (Mon, 28 Jan 2008) | 6 lines

Fix up some T38 state change issues.
(closes issue ASTERISK-11106)
Reported by: dimas
Patches:
     v2-sip-t38state.patch uploaded by dimas (license 88)

................
r100674 | mmichelson | 2008-01-28 13:58:12 -0700 (Mon, 28 Jan 2008) | 10 lines

Blocked revisions 100673 via svnmerge

........
r100673 | mmichelson | 2008-01-28 14:55:56 -0600 (Mon, 28 Jan 2008) | 3 lines

Undoing the deprecation of chan_vpb. It is alive and well.


........

................
r100676 | qwell | 2008-01-28 14:02:11 -0700 (Mon, 28 Jan 2008) | 16 lines

Merged revisions 100672 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

(closes issue ASTERISK-11263)
........
r100672 | qwell | 2008-01-28 14:42:43 -0600 (Mon, 28 Jan 2008) | 7 lines

When using ODBC_STORAGE, make sure we put greeting files into the database like we do with the others.

Issue ASTERISK-11263
Reported by: dimas
Patches:
     vmgreet.patch uploaded by dimas (license 88)

........

................
r100677 | tilghman | 2008-01-28 14:05:29 -0700 (Mon, 28 Jan 2008) | 10 lines

Merged revisions 100675 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r100675 | tilghman | 2008-01-28 15:02:02 -0600 (Mon, 28 Jan 2008) | 2 lines

WaitExten didn't handle AbsoluteTimeout properly (went to 't' instead of 'T')

........

................
r100678 | mmichelson | 2008-01-28 14:07:18 -0700 (Mon, 28 Jan 2008) | 3 lines

Re-inserting chan_vpb into trunk.


................
r100679 | qwell | 2008-01-28 14:11:24 -0700 (Mon, 28 Jan 2008) | 1 line

Reintroduce more chan_vpb stuff that was removed in r100421 and r100422
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=100881