Summary:ASTERISK-02917: Early hangup during MGCP transfer (like unattended transfer) crashes running asterisk
Reporter:florian (florian)Labels:
Date Opened:2004-12-02 10:17:25.000-0600Date Closed:2011-06-07 14:00:46
Versions:Frequency of
Environment:Attachments:( 0) rtp-sanity.txt
Description:I have a setup with approx. 100 MGCP phones (swissvoices) on an asterisk box. when a user does an attended transfer things are ok mostly (there is something wrong sometimes, but i cant reproduce that, so its for another issue). however if the transferring party hangs up early, asterisk dies immediately.


gdb backtrace output from the remaining core file:

#0  0x08087a0f in ast_rtp_read (rtp=0x0) at rtp.c:413
413             res = recvfrom(rtp->s, rtp->rawdata + AST_FRIENDLY_OFFSET, sizeof(rtp->rawdata) - AST_FRIENDLY_OFFSET,
(gdb) bt
#0  0x08087a0f in ast_rtp_read (rtp=0x0) at rtp.c:413
#1  0x40331f92 in mgcp_read (ast=0x810e618) at chan_mgcp.c:1102
#2  0x0805a658 in ast_read (chan=0x810e618) at channel.c:1334
#3  0x0805d96c in ast_channel_bridge (c0=0x8231060, c1=0x810e618, config=0xbc7f9d90, fo=0xbc7f94f4, rc=0xbc7f94f8) at channel.c:2688
#4  0x402c7171 in ast_bridge_call (chan=0x8231060, peer=0x810e618, config=0xbc7f9d90) at res_features.c:408
ASTERISK-1  0x405842e4 in dial_exec (chan=0x8231060, data=0xbc7fa43e) at app_dial.c:1019
ASTERISK-2  0x0806fd2c in pbx_exec (c=0x8231060, app=0x81636d0, data=0xbc7fa43e, newstack=1) at pbx.c:470
ASTERISK-3  0x402e076a in handle_exec (chan=0x8231060, agi=0xbc7fac98, argc=3, argv=0xbc7fa234) at res_agi.c:828
ASTERISK-4  0x402df1a5 in run_agi (chan=0x8231060, request=0xbc7faca4 "callchaninbound", agi=0xbc7fac98, pid=2010, dead=0) at res_agi.c:1452
ASTERISK-5  0x402dfa71 in agi_exec_full (chan=0x8231060, data=0xbc7fd804, enhanced=0, dead=0) at res_agi.c:1671
ASTERISK-6 0x402e0ea0 in agi_exec (chan=0x8231060, data=0xbc7fd804) at res_agi.c:1684
ASTERISK-7 0x0806fd2c in pbx_exec (c=0x8231060, app=0x80ff7b0, data=0xbc7fd804, newstack=1) at pbx.c:470
ASTERISK-8 0x08071d79 in pbx_extension_helper (c=0x8231060, context=0x82311b8 "routing", exten=0x82312ac "883000000000531", priority=1,
   callerid=0x8102580 "Virtu <530>", action=1) at pbx.c:1277
ASTERISK-9 0x08072b1d in ast_pbx_run (c=0x8231060) at pbx.c:1761
ASTERISK-10 0x40336712 in mgcp_ss (data=0x8231060) at chan_mgcp.c:2519
ASTERISK-11 0x400200ba in pthread_start_thread () from /lib/libpthread.so.0
Comments:By: florian (florian) 2004-12-02 12:12:31.000-0600

DUH! Okay, this bug was going terribly wrong:

Category: MGCP
Version: 1.0.2

Tried with an early release of chan_mgcp from the 1.0.2 branche, and with the one in the tarfile.

By: Mark Spencer (markster) 2004-12-02 20:57:21.000-0600

Need to find me on IRC so I can login and try to fix it on your system.  I'll need you to be able to duplicate this in a controlled environment so I can work on it.

By: twisted (twisted) 2004-12-02 21:29:31.000-0600

Moved bug to the correct location.

By: florian (florian) 2004-12-03 00:22:09.000-0600

I will try to set up a duplication environment over the weekend (although its holidays.

By: florian (florian) 2004-12-07 15:06:11.000-0600

This is not a dead issue, I just haven't managed to arrange access for duplication/debugging by Mark yet. Will get back ASAP.

By: Olle Johansson (oej) 2004-12-19 07:53:03.000-0600

Florian: What's happening? Still a problem?


By: florian (florian) 2004-12-20 14:50:22.000-0600

I am unable to reproduce on a smaller setup right now, and I cannot provide access to the production system. Feel free to close this issue untill I find a chance to investigate further.
If anyone else using the swissvoice's in a small or big setup can comment, I'd appreciate it...

edited on: 12-20-04 14:50

By: Mark Spencer (markster) 2004-12-20 17:54:56.000-0600

Closed per bug placer's request.

By: florian (florian) 2005-01-05 17:53:00.000-0600

Okay, so I'm still struggling to establish the actual reason for the crash, so we can take a closer look. In the meantime, I've made a small sanity check, and I am wondering if you feel this would make a difference. At least the pbx would no longer crash, but the channel might still be corrupt. I consider that an improvement, though... See rtp-sanity.txt and please send comments.

By: Andrey S Pankov (casper) 2005-01-06 08:15:54.000-0600

First, it would be nice if you follow coding guidelines when patching:
"Try to match the existing formatting of the file you are working on."

Second, rtp SHOULD be not NULL here. If it is NULL then this is a race. The common policy here is not to check such things in core but fix them outside.

Moreover... can your confirm this is reproducible with latest v1-0 code since there were lots of changes?

By: florian (florian) 2005-01-06 09:00:40.000-0600

I totally agree with you on the fact that it should never end up there with a null pointer, but it actually does. I am not sure exactly when this happens, but it does happen, pretty often (approx once a day on a busy pbx). It seems to be related to the MGCP transfer in some way, but it's hard to get a hold of. In the meantime I would like to understand why exactly it would be bad to add two lines of sanity check that prevents the entire pbx from crashing (or does it cause an unexpected effect I did not foresee ?)

By: Andrey S Pankov (casper) 2005-01-06 09:09:30.000-0600

You see... the policy... this is the current policy... I'm about 100% sure markster will not approve such a change.

And again, can you confirm this is reproducible with latest stable v1-0 codebase? If yes, can you please attach (not inline) an updated backtrace here? Thanks!

edited on: 01-06-05 09:11

By: florian (florian) 2005-01-06 09:41:37.000-0600

I already said I don't consider this a proper bugfix. I am however concerned about production systems that suffer from this issue. Therefore I am now putting this fix in systems in the expectance that it will at least no longer drop dead. In the meantime, no I am not able to reproduce the crash on demand, so I cannot verify what it does or doesn't do. This is exactly why I had the ticked closed a while ago.

Having said that, I reopened the ticket because I wanted feedback on this 'hack' I've come up with, and I noticed other people had the same crash (Thomas Dingermann posted it on the mailinglist a while ago). Hopefully we can find more people seeing this issue and actually figure out what is happening and what the proper bugfix is...

By: Andrey S Pankov (casper) 2005-01-06 09:51:58.000-0600

The "proper bugfix" is locking improvements made in HEAD. But there are still several issues not resolved yet there. Does this happen with HEAD?

By: dsandras (dsandras) 2005-01-07 09:58:03.000-0600

Klaus-Peter redirected me to that bug report and I can say that I have encountered it several times too. I'm using 1.0.3, with only SIP and ZAP, and sometimes the RTP frame is NULL. There is no transfer involved in my case.

I can not use CVS, I must stay with stable, as many people with production systems.

Here is the bt :
#0  0x0808db36 in ast_rtp_read (rtp=0x0) at rtp.c:413
413 rtp.c: Aucun fichier ou répertoire de ce type.
in rtp.c
(gdb) bt full
#0  0x0808db36 in ast_rtp_read (rtp=0x0) at rtp.c:413
#2  0x40488969 in sip_read (ast=0x86ea2d0) at chan_sip.c:2234
#3  0x0805c38d in ast_read (chan=0x86ea2d0) at channel.c:1334
#4  0x0805fbc9 in ast_channel_bridge (c0=0x8722b58, c1=0x86ea2d0,
   config=0xbbbfabc4, fo=0xbbbfa334, rc=0xbbbfa338) at channel.c:2688
ASTERISK-1  0x4035cc95 in ast_bridge_call (chan=0x8722b58, peer=0x86ea2d0,
   config=0xbbbfabc4) at res_features.c:416
ASTERISK-2  0x40645b84 in dial_exec (chan=0x8722b58, data=0x50) at app_dial.c:1039
ASTERISK-3  0x080749cf in pbx_exec (c=0x8722b58, app=0x81ecf38, data=0xbbbfd0f4,
   newstack=1) at pbx.c:470
ASTERISK-4  0x0807cc53 in pbx_extension_helper (c=0x8722b58,
   context=0x8722cb0 "from-outgoing-lines", exten=0x8722da4 "6064",
   priority=6, callerid=0x8a024e8 "006664", action=135103199) at pbx.c:1278
ASTERISK-5  0x08076a38 in ast_pbx_run (c=0x8722b58) at pbx.c:1762
ASTERISK-6 0x4036cfc5 in ss_thread (data=0x8722b58) at chan_zap.c:4842

edited on: 01-07-05 10:03

By: Mark Spencer (markster) 2005-01-07 15:34:48.000-0600

This is not head, and there is no place which sets ->rtp = NULL.  Is this an unpatched asterisk?

By: dsandras (dsandras) 2005-01-07 15:44:38.000-0600

In my case, no, I forgot to mention it.

I don't know for Florian.

By: florian (florian) 2005-01-13 01:20:21.000-0600

We reverted to complete clean setups (although we require BRIstuff because of ISDN2 cards) and have not seen the issue since. Will report back as soon as possible. However since dsandras indicates this also can happen in some SIP scenario's with his machine, the issue is obviously in the wrong category. Bugmarshals should probably indicate wether they need this issue closed and a new issue opened, of if they can perhaps modify this one..

By: Russell Bryant (russell) 2005-01-18 20:36:10.000-0600

Since this doesn't appear to be an issue with unpatched code, I'm going to close this one for now.  Feel free to re-open if you feel that there is still an issue.