[Home]

Summary:ASTERISK-12960: crash or dialing isn't possible
Reporter:pj (pj)Labels:
Date Opened:2008-10-24 14:24:25Date Closed:2011-07-26 14:57:37
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_skinny
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20090203__bug13777.diff.txt
( 1) gdb1.txt
( 2) gdb10.txt
( 3) gdb11.txt
( 4) gdb2.txt
( 5) gdb3.txt
( 6) gdb4.txt
( 7) gdb5.txt
( 8) gdb6.txt
( 9) gdb7.txt
(10) gdb8.txt
(11) gdb9.txt
(12) indicate.diff
(13) indicate2.diff
(14) valgrind.txt
(15) valgrind2.txt
(16) valgrind3.txt
Description:asterisk sometimes crashes, when attempt to dial from skinny phone (cisco 7920) or if asterisk doesn't crash, after some successfull skinny calls, more calls are not possible: I type number on phone, then press 'dial' button, after that, number disappears from phone display (skinny debug shows nothing in this case), restarting phone doesn't solve this problem, I must restart asterisk to restore functionality, asterisk isn't locked in this situation ('core show locks' is empty)
it seems, that svn-trunk isn't affected by this issue

****** ADDITIONAL INFORMATION ******

outputs from gdb from two coredumps are attached
Comments:By: pj (pj) 2008-11-12 15:21:22.000-0600

Asterisk SVN-trunk-r155467
crash after answer call (skinny-to-sip),
uploading gdb3.txt

By: Damien Wedhorn (wedhorn) 2008-11-21 00:37:42.000-0600

Added patch. Compiles and seems to run. Tests all the structures at the beginning of skinny_indicate. Appears that there is an overlap where you are making a call to a session that's died.

This should fix the crash (at least one bit of it), but there could still be some underlying issues with losing sessions. PJ, can you test.

By: pj (pj) 2008-11-29 14:40:50.000-0600

indicate.diff  applied to Asterisk SVN-trunk-r159853
I will report after some weeks using asterisk patched with this.

By: pj (pj) 2008-12-03 05:21:06.000-0600

indicate.diff applied to Asterisk SVN-trunk-r159853
crash occured after:

[Dec  3 12:14:36] WARNING[8367]: chan_skinny.c:5763 destroy_session: Trying to delete nonexistent session 0x84d31f8?
[Dec  3 12:14:58] NOTICE[8590]: chan_skinny.c:3693 skinny_indicate: Asked to indicate 'Media Source Update' condition on channel Skinny/455@JP-37, but device does not exist.

gdb4.txt attached

By: pj (pj) 2008-12-06 09:45:01.000-0600

another crash, uploaded gdb5.txt

By: Damien Wedhorn (wedhorn) 2008-12-06 14:39:47.000-0600

another patch. Be aware that there seems to be an underlying issue between asterisk and the device. No patch will fix that, but we can try to stop the crashing.

I'm fixing up the sub handling at the moment and big parts of the relevant code are being completely rewritten, so hopefully this crash will just disappear.

By: pj (pj) 2008-12-16 13:42:41.000-0600

Asterisk SVN-trunk-r163675 +  indicate2.diff
another crash, in this case, in begin of dialing
uploaded gdb6.txt

[Dec 16 20:35:26]     -- Starting Skinny session from 192.168.31.209
[Dec 16 20:35:26]     -- Device 'SEP000xxxxxxx' successfully registered
[Dec 16 20:35:26] Device capability set to '256'
[Dec 16 20:35:26] Adding button: 9, 1
[Dec 16 20:35:31] WARNING[15662]: chan_skinny.c:1607 find_subchannel_by_instance_reference: Could not find subchannel with reference '21' on 'JP'
*CLI>
Disconnected from Asterisk server

By: Michiel van Baak (mvanbaak) 2008-12-17 13:01:17.000-0600

I have a crash as well, but all the backtraces I get are like your first one. A crash with the devicestate.

Program terminated with signal 11, Segmentation fault.
#0  0xb72ceab1 in get_devicestate (l=0x8207000) at chan_skinny.c:3625
3625 if (sub->onhold) {


I get that in all backtraces.
And it happens everytime I finish a call and hangup the phone.

By: pj (pj) 2008-12-17 13:05:36.000-0600

FYI: originally, I created this bugreport for asterisk 1.6, but my last crashes are from trunk (patches are applied successfully to asterisk trunk).

By: pj (pj) 2009-01-08 06:01:28.000-0600

if it can help, gdb7.txt contains two debugs from similar crashes from last two days
Asterisk SVN-trunk-r163675 + indicate2.diff

By: Tilghman Lesher (tilghman) 2009-01-21 16:44:16.000-0600

Given where this is crashing, it seems very likely that this is due to memory corruption.  Please follow the instructions in doc/valgrind.txt.

By: Damien Wedhorn (wedhorn) 2009-01-21 17:37:15.000-0600

Actually, my guess is that it's a locking issue. It appears that a call continues being processed while the session with the device is torn down. Of course, this will lead to memory corruption as the call will happily write to areas of memory that have been freed by another thread.

I've changed the locking on a rewrite I'm doing that will hopefully either fix this or make it easier to fix.

By: pj (pj) 2009-01-23 15:30:22.000-0600

I would be happy, if you don't need valgrind debug from me, because my asterisk crashes only sometimes and I don't know, how to invoke crash, thus running asterisk all the time from valgrind will be problematic for me.

By: Tilghman Lesher (tilghman) 2009-01-24 08:05:17.000-0600

pj:  At this point, only the valgrind is going to help.  Please note that even if when running valgrind, Asterisk does not crash, the output provided by that utility should be helpful in tracking this down.  Your gdb output only serves to reinforce the idea that this is a memory corruption issue and provides no further useful information.

By: pj (pj) 2009-02-03 16:55:04.000-0600

valgrind.txt attached, asterisk doesn't crash during my test,
I tried to make call and disconnect/connect cable during this call and some error appears in valgrind log during this.
Also issue, as I reported in ASTERISK-1392070 happened, ie. I was not able to make any new calls, until manualy hangup death skinny channel from CLI, and restarting phone. tested with cisco 7940.
please tell me, if this valgrind is somehow usefull for one or both of issues, as I mention here.

By: Tilghman Lesher (tilghman) 2009-02-03 19:03:03.000-0600

Patch uploaded; ready for testing.

By: pj (pj) 2009-02-03 19:12:05.000-0600

Can you tell me, if I should test only your last patch standalone, or if I should apply both, ie. with indicate2.diff. Please tell me also, if you make this patch for trunk.

By: Tilghman Lesher (tilghman) 2009-02-03 19:34:55.000-0600

pj: I don't know what the other patches do, but the issue found by the valgrind output is what I have solved with my patch.

By: Damien Wedhorn (wedhorn) 2009-02-03 19:45:25.000-0600

indicate2 just does some pointer checking. I assume that your valgrind output was based on including indicate2 and so would be worth keeping, also, without it you were getting segfaults.

By: pj (pj) 2009-02-04 11:42:47.000-0600

SVN-trunk-r173311 + indicate2.diff +  20090203__bug13777.diff.txt
crash after I attempted to press 'end' or 'transfer' 7940 softkeys,
It was same test as yesterday: manually break connection during call, after this phone was unusable ('end call' and other softkeys doesn't work).
uploading gdb8.txt and valgrind2.txt (from valgrind I was not able to reproduce crash).



By: pj (pj) 2009-02-04 12:45:44.000-0600

I can finally reproduce crash, when running from valgrind too:
- make call and break connection
- connect phone again and make another call,
- try to hangup, "end call" softkey doesn't work
- press 'transfer' softkey
- crash
valgrind3.txt attached



By: pj (pj) 2009-02-05 03:12:00.000-0600

another gdb9.txt from crash during normal operation, after call answer...

By: pj (pj) 2009-02-08 06:19:10.000-0600

perhaps gdb10.txt contains some interesting info about bad locking situation before crash, this is from rpid branch
http://svn.digium.com/svn/asterisk/team/group/issue8824
patched with indicate2.diff and 20090203__bug13777.diff.txt

By: Digium Subversion (svnbot) 2009-02-16 17:15:01.000-0600

Repository: asterisk
Revision: 176320

U   trunk/channels/chan_skinny.c

------------------------------------------------------------------------
r176320 | tilghman | 2009-02-16 17:15:00 -0600 (Mon, 16 Feb 2009) | 7 lines

Use the correct list macros for deleting an item from the middle of a list.
(issue ASTERISK-12960)
Reported by: pj
Patches:
      20090203__bug13777.diff.txt uploaded by Corydon76 (license 14)
Tested by: pj

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=176320

By: Digium Subversion (svnbot) 2009-02-16 17:17:02.000-0600

Repository: asterisk
Revision: 176321

_U  branches/1.6.1/
U   branches/1.6.1/channels/chan_skinny.c

------------------------------------------------------------------------
r176321 | tilghman | 2009-02-16 17:17:02 -0600 (Mon, 16 Feb 2009) | 14 lines

Merged revisions 176320 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r176320 | tilghman | 2009-02-16 17:14:08 -0600 (Mon, 16 Feb 2009) | 7 lines
 
 Use the correct list macros for deleting an item from the middle of a list.
 (issue ASTERISK-12960)
  Reported by: pj
  Patches:
        20090203__bug13777.diff.txt uploaded by Corydon76 (license 14)
  Tested by: pj
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=176321

By: pj (pj) 2009-03-30 06:45:16

another crash - gdb11.txt
if you think, that it's unrelated to reported issue, please tell me, I will create new bugreport for this

By: Tilghman Lesher (tilghman) 2009-03-30 12:12:19

pj:  It may or may not be, but as stated previously, the ONLY debug that's going to help on this issue is valgrind output.  gdb output at this point is gratuitous and useless.

By: pj (pj) 2009-03-30 13:08:41

OK, I also attach valgrind outputs some time ago, but got no feedback for that.

By: Tilghman Lesher (tilghman) 2009-03-31 10:17:02

pj:  The first valgrind you attached resulted directly in a patch, which was committed.  The second valgrind output is for when you hit Ctrl-C to exit the console and it crashed.  The third valgrind output is for a transfer problem, which I have not yet figured out (the output is somewhat sketchy).

By: pj (pj) 2009-04-10 07:25:57

I can't supply better valgrind, because I don't know, how to invoke my crashes, that sometimes appear. I can't run from valgrind all this time, it crash maybe one per week or two weeks (but I have only one-three skinny phones online with few calls a day).
So if you can't find real source of my crashes from valgrinds and debugs, that I attached in the past, I think you can commit your patch that I'm still using indicate2.diff, it seems, that it helps a little and close this bugreport, thanks!



By: Damien Wedhorn (wedhorn) 2009-04-13 07:41:52

indicate2.diff shouldn't be committed. It was only designed to help find the issue, not as a solution. I've been working on the sub handling stuff which should hopefully make finding this issue easier.

There's an issue in trunk at the moment that needs fixing, but I should have my stuff up shortly after that is fixed. I'll post a pointer here when it's posted.

By: pj (pj) 2009-04-16 03:06:21

so, if I understand correctly indicate2.diff can't have some possitive effect to eliminate my crashes? I'm still using this, but if it's useless I will stop applying this patch when will next update.

By: Damien Wedhorn (wedhorn) 2009-04-16 03:29:25

It will probably have a positive effect, but it does a lot of checking on stuff that should not need checking, or not to that level anyway.

Hopefully I'll post the new sub-handling stuff this weekend (assuming I can merge in the last couple of patches to skinny). But this stuff will definitely be for alpha testing (and transfer won't work - still pondering that one). However, the message handling to/from devices is a lot more structured and the sort of issue that you are having should be fairly easy to fix (if it still exists).

By: pj (pj) 2009-04-16 03:52:05

OK, I don't use transfers or other special features at all, so I can test your stuff, if it help to eliminate crashes.

By: pj (pj) 2009-05-04 09:27:32

I would like to ask, if your patch indicate2.diff, that I'm using can't cause some memory leak, that I reported in bugreport #0014636
My asterisk still eating more and more memory, even no calls are processed
can't it be caused by handling of registration/unregistration process, eg. not unfreed memory when phone unregisters?
Has chan_skinny ability to discover this issue by MALLOC_DEBUG compiler flag?

[May  4 16:21:43] WARNING[14455]: chan_skinny.c:6321 destroy_session: Trying to delete nonexistent session 0x8dfef90?
[May  4 16:21:48]     -- Starting Skinny session from 88.103.132.21
[May  4 16:21:48] WARNING[14456]: chan_skinny.c:6357 get_input: read() returned error: Connection reset by peer
[May  4 16:21:48] WARNING[14456]: chan_skinny.c:6321 destroy_session: Trying to delete nonexistent session 0x8dfef90?
[May  4 16:21:49]     -- Starting Skinny session from 88.103.132.21
[May  4 16:21:49]     -- Skinny mwi_event_cb found 0 new messages
[May  4 16:21:49]     -- Device 'SEP000D288E6669' successfully registered

By: Russell Bryant (russell) 2011-07-26 14:57:30.783-0500

Per the Asterisk maintenance timeline page at http://www.asterisk.org/asterisk-versions maintenance (bug) support for the 1.4 and 1.6.x branches has ended. For continued maintenance support please move to the 1.8 branch which is a long term support (LTS) branch. For more information about branch support, please see https://wiki.asterisk.org/wiki/display/AST/Asterisk+Versions

If this is still an issue, please open a new issue so it can be re-triaged appropriately. Thanks!