Summary:ASTERISK-09360: Segfault in chan_iax2.c - similar to 9278 and 9294
Reporter:Steve Davies . (stevedavies)Labels:
Date Opened:2007-05-01 12:10:47Date Closed:2007-07-11 19:59:10
Versions:Frequency of
Environment:Attachments:( 0) btfull.1
( 1) btfull.2
( 2) iax2-null-iaxs-segfaults.1.4.patch
( 3) iax2-null-iaxs-segfaults.patch
Description:Running Trunk SVN revision 62219 for IAX2 routing - up to about a dozen concurrent calls.

About once a day I see a segfault at line 6784 in chan_iax2.c:

6784          if ((iaxs[fr->callno]->iseqno != fr->oseqno) &&
6785                  (iaxs[fr->callno]->iseqno ||
6786                          ((f.subclass != IAX_COMMAND_TXCNT) &&
6787                          (f.subclass != IAX_COMMAND_TXREADY) &&          /* for attended transfer */
6788                          (f.subclass != IAX_COMMAND_TXREL) &&            /* for attended transfer */
6789                          (f.subclass != IAX_COMMAND_UNQUELCH ) &&        /* for attended transfer */
6790                          (f.subclass != IAX_COMMAND_TXACC)) ||
6791                          (f.frametype != AST_FRAME_IAX))) {

The problem being that iaxs[fr->callno] is 0 so the ->iseqno reference causes a segfault.

This problem is probably hitting me because I run on a hardened gentoo platform.  PAX and GrSecutiry and all that.  Other platforms will probably allow this without a segfault (though the reference will still be invalid).

34 lines earlier, socket_process does:

       /* count this frame */

And this action succeeds, so I can only assume that some other thread nulled out iaxs[fr->callno] in the meantime which suggests a problem with locking.

I'll upload the gdb stuff in a moment.

You've got my disclaimer etc.



Hardened Gentoo Platform with Linux-VServer
SVN TRUNK 62219 (2007-04-27)
Disclaimer on file
Comments:By: Steve Davies . (stevedavies) 2007-05-01 12:20:53

bt fulls attached as promised.  Please note that gdb does have some problems coping with the PaX environment.

Here's the guts of the problem:

6782                if ((ntohs(fh->dcallno) & IAX_FLAG_RETRANS) || (f.frametype != AST_FRAME_VOICE))
6783                        updatehistory = 0;
6784                if ((iaxs[fr->callno]->iseqno != fr->oseqno) &&  /* SLD: Segfault because fr->callno isn't an active call in iaxs[] */
6785                        (iaxs[fr->callno]->iseqno ||6786                                    ((f.subclass != IAX_COMMAND_TXCNT) &&
6787                                (f.subclass != IAX_COMMAND_TXREADY) &&          /* for attended transfer */
6788                                (f.subclass != IAX_COMMAND_TXREL) &&            /* for attended transfer */

(gdb) p fr->callno$1 = 71
(gdb) p iaxs[71]
$2 = (struct chan_iax2_pvt *) 0x0
(gdb) p iaxs[71]->iseqno
Cannot access memory at address 0x11

By: Steve Davies . (stevedavies) 2007-05-14 14:51:31


Here's a patch that fixes this segfault.

Its caused by unlocking and relocking the iaxs[fr->callno] and then carrying on using it after that without checking that it isn't now null.  That can be, because another thread destroyed the channel.

I also went through all the other places where I saw the "unlock/relock" pattern and checked the behaviour.  My patch inserts a little FIXME comment where I think there may be a problem.

I've uploaded iax2-null-iaxs-segfaults.patch for SVN trunk (rev 62263).  This contains the fix for the reported segfault.

I've also uploaded iax2-null-iaxs-segfaults.1.4.patch for SVN branch-1.4 (rev 62989).  This is the equivalent patch for 1.4, though its actually just the FIXME comments because the problem code causing the reported segfault was only added in the trunk.


By: Steve Davies . (stevedavies) 2007-05-14 14:53:10

Should note that my disclaimer is on file, and ask someone to rename the bug with [patch] in the name.


By: mihai (mihai) 2007-05-14 15:21:35

You might want to take a look at bug ASTERISK-9609666, I've uploaded a patch that fixes a few places where the same pattern actually causes crashes in 1.4.x


By: Steve Davies . (stevedavies) 2007-05-19 01:46:12


i merged your patch into both my branch-1.4 and trunk trees;  running trunk on voipconnect and we'll see how things go.


By: pma (pma) 2007-05-30 09:29:50

I seem to be experiencing the same problem in 1.4.4. Asterisk segfaults and gdb trace shows that iaxs[fr->callno] is 0x0, according to gdb it segfaults here:

#0  0xb7182e37 in socket_process (thread=0x822c068) at chan_iax2.c:6787
6787                                    if (ast_strlen_zero(iaxs[fr->callno]->secret) && ast_strlen_zero(iaxs[fr->callno]->inkeys)) {

So i guess problem exists even in 1.4 branch even with NEW frame

By the way i got other segfaults in channel.c which are probably caused by the same reason (unlocking and locking again) here is an example from channel.c

01138    ast_channel_lock(chan);
01139    ast_channel_unlock(chan);
01140    if (chan->tech_pvt) {
01141       ast_log(LOG_WARNING, "Channel '%s' may not have been hung up properly\n", chan->name);
01142       free(chan->tech_pvt);
01143    }

By: pma (pma) 2007-06-01 08:20:57

i tried temporairly patching the code in chan_iax2.c

06780             if (strcasecmp(iaxs[fr->callno]->exten, "TBD")) {
06781                ast_mutex_unlock(&iaxsl[fr->callno]);
06782                exists = ast_exists_extension(NULL, iaxs[fr->callno]->context, iaxs[fr->callno]->exten, 1, iaxs[fr->callno]->cid_num);
06783                ast_mutex_lock(&iaxsl[fr->callno]);

adding a simple check at the end: if (!iaxs[fr->callno]) break;

and it seems to fix the issue for a while but then i got another segfault in the code between ast_mutex_unlock and ast_mutex_lock

#0  0xb7220d7d in socket_process (thread=0xb6a227b8) at chan_iax2.c:6783
6783                                            exists = ast_exists_extension(NULL, iaxs[fr->callno]->context, iaxs[fr->callno]->exten, 1, iaxs[fr->callno]->cid_num);

i guess it should be fixed or mostly avoided by adding few other checks before ast_exists_extension and next in pbx_extension_helper but maybe there is some other way?

By: Russell Bryant (russell) 2007-06-04 16:46:04

I removed the unlock/lock that was causing the crash you fixed the patch here.  It was left over from some bad code that I had put in trunk but took back out.  However, I'm going to try to continue working on all the places that you and Mihai have pointed out in this issue and 9666.

By: Russell Bryant (russell) 2007-06-04 16:46:18

The unlock/lock was removed in revision 67070.

By: Russell Bryant (russell) 2007-06-04 18:35:24

I have addressed the code you fixed in your patch, plus some more places, in both 1.4 and trunk in revisions 67158 and 67160.  Thanks!