[Home]

Summary:ASTERISK-13706: Asterisk crashes when Dial() to a sip channel terminates
Reporter:Marc A. Pelletier (mapelletier)Labels:
Date Opened:2009-03-07 14:24:50.000-0600Date Closed:2011-06-07 14:00:30
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Applications/app_dial
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:Inside Dial(), when either endpoint hangs up, the application crashes (gdb backtrace included).  Problem does not occur on calls going through Queue().

****** ADDITIONAL INFORMATION ******

Built from SVN, no special options:

0xb7756249 in end_bridge_callback_data_fixup (bconfig=0x816916b,
   originator=0x8aa9798, terminator=0xb6c1cda0) at app_dial.c:1261
1261    app_dial.c: No such file or directory.
       in app_dial.c
(gdb) bt
#0  0xb7756249 in end_bridge_callback_data_fixup (bconfig=0x816916b,
   originator=0x8aa9798, terminator=0xb6c1cda0) at app_dial.c:1261
#1  0x080b03cd in ast_bridge_call (chan=0x8aa7338, peer=0x8aa9798,
   config=0xb6c1cda0) at features.c:2450
#2  0xb775a262 in dial_exec_full (chan=0x8aa7338, data=0xb6c1ef28,
   peerflags=0xb6c1ce80, continue_exec=0x0) at app_dial.c:1962
#3  0xb775e1c9 in dial_exec (chan=0x8aa7338, data=0xb6c1ef28)
   at app_dial.c:2026
#4  0x080e24a4 in pbx_exec (c=0x8aa7338, app=0x8a04c78, data=0xb6c1ef28)
   at pbx.c:942
ASTERISK-1  0x080ecc67 in pbx_extension_helper (c=0x8aa7338, con=0x0,
   context=0x8aa74c0 "home", exten=0x8aa7510 "94504308166", priority=3,
   label=0x0, callerid=0x8a84988 "514-316-1065", action=E_SPAWN,
   found=0xb6c21348, combined_find_spawn=1) at pbx.c:3111
ASTERISK-2  0x080ee55b in __ast_pbx_run (c=0x8aa7338, args=0x0) at pbx.c:3614
ASTERISK-3  0x080ef9d0 in pbx_thread (data=0x8aa7338) at pbx.c:3974
ASTERISK-4  0x081294bb in dummy_start (data=0x8a94f50) at utils.c:861
ASTERISK-5  0xb7bbc8ad in ?? () from /lib/libpthread.so.0

Comments:By: snuffy (snuffy) 2009-03-07 18:07:10.000-0600

Can you post the dialplan you use to produce this error?
aka i'm assuming its more than just exten => 100,1,Dial(xxx)

By: Marc A. Pelletier (mapelletier) 2009-03-07 19:04:04.000-0600

Barely more; the relevant excerpts:

--

context outside {
   ignorepat => 9;
   _9X. => {
       Set(CALLERID(name)=${DB(exten/${caller}/cidn)});
       Set(CALLERID(num)=${DB(exten/${caller}/cid)});
       Dial(${TRUNK}/${EXTEN:1},,${dialf});
   }
   // ... other extensions
}

context home {
   includes {
       homeext;
       functions;
       outside;
   };

   s => {
       CHANNEL(language) = fr;
       BackGround(who-would-you-like-to-call);
       WaitExten;
   };

   i => {
       BackGround(invalid);
       WaitExten;
   };

   t => Congestion;
};

--

The context home is that of the sip phones (SPA941s) that cause the problem.  "dialf" is set to "wtkx" from the sip.conf, and caller is set to a phone ID which then just matches keys in the DB for caller id number and name.

There are plenty of other contexts for the incoming calls, but they are not involved and removing them does not eliminate the errors.

By: Marc A. Pelletier (mapelletier) 2009-03-07 19:08:03.000-0600

BTW: I can arrange for a coredump if it helps; but it's a production system so I'll need a bit to time it right.

By: Marc A. Pelletier (mapelletier) 2009-03-08 18:35:06

I just noticed:

#0 0xb7756249 in end_bridge_callback_data_fixup (bconfig=0x816916b,
   originator=0x8aa9798, terminator=0xb6c1cda0) at app_dial.c:1261
#1 0x080b03cd in ast_bridge_call (chan=0x8aa7338, peer=0x8aa9798,
   config=0xb6c1cda0) at features.c:2450

That looks very much like the parameter order is wrong.  Lemme try a quick patch.

By: Marc A. Pelletier (mapelletier) 2009-03-08 18:42:21

It's actually a little odder than first seems:  at frame #1 above, in features.c

       if (config->end_bridge_callback) {
               config->end_bridge_callback(config->end_bridge_callback_data);
       }

but config is:

$2 = {features_caller = {flags = 118}, features_callee = {flags = 0},
 start_time = {tv_sec = 1236555591, tv_usec = 462557}, nexteventts = {
   tv_sec = 0, tv_usec = 0}, feature_timer = 0, timelimit = 0,
 play_warning = 0, warning_freq = 0, warning_sound = 0x0, end_sound = 0x0,
 start_sound = 0x0, firstpass = 0, flags = 1,
 end_bridge_callback = 0xb7799240 <end_bridge_callback_data_fixup>,
 end_bridge_callback_data = 0x816916b,
 end_bridge_callback_data_fixup = 0x988a005}

... note how ->end_bridge_callback points to end_bridge_callback_data_fixup
!

By: Marc A. Pelletier (mapelletier) 2009-03-08 20:13:07

Heap corruption when freeing memory and removing scheduler entry on XMIT_FAILURE; it seems.

When it tries to validate the peers in DB, it does so /before/ sockip is set in chan_sip.c.  Moving the initialization of peers after that of the socket hides the problem (that is, removes the cause of XMIT_FAILURE).  Once those failures occur, all hell break loose.

Regression introduced in r179220.  Problem hidden by changing a bit of initialization order in chan_sip.c:reload_config() - but still extant.

Initializing peers after the socket is up is probably better in general *anyways*; I'll provide a (trivial) patch if you want.

By: Leif Madsen (lmadsen) 2009-04-13 13:49:48

mapelletier: we always love patches! :)  Can you provide one?

By: Leif Madsen (lmadsen) 2009-05-04 09:14:55

Any ability to provide a patch here?

By: Joshua C. Colp (jcolp) 2009-05-14 12:40:22

This issue has already been fixed. The merge assumed that 1.6.0 stored packets the same way as trunk, which was incorrect. Memory was getting freed that should not have been. The revision that fixed it was 186517.