[Home]

Summary:ASTERISK-07571: [patch] Asterisk crashes with chan_skinny
Reporter:sbisker (sbisker)Labels:
Date Opened:2006-08-21 09:32:33Date Closed:2006-12-22 14:44:19.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_skinny
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 7774-sbisker-bt.txt
( 1) bt_full_no_opt.txt
( 2) bt_full.txt
( 3) bt_full3.txt
( 4) bt_full4.txt
( 5) skinny-patch.diff
Description:Every few days, an Asterisk system I have with 2 Cisco 7920 phones crashes and dumps core.

System Specs:
Centos 4.3
Asterisk: SVN-trunk-r36380


****** ADDITIONAL INFORMATION ******

Here is the Backtrace Info:

-= snipped by vechers and put in 7774-sbisker-bt.txt =-
Comments:By: Serge Vecher (serge-v) 2006-08-21 10:52:30

sbisker: please attach backtraces/log, not post them inline. Thanks.

By: Anthony LaMantia (alamantia) 2006-08-21 11:00:48

the only memcpy i see in transmit_response that i can see causing this *just by looking at the back trace*

is this

memcpy(s->outbuf+skinny_header_size, &req->data, sizeof(skinny_data));

i will shortly post patch that may help you out, but maybe if you can provide us some more information it would be helpfully

if you use gdb can you provide me the addresses that are being passed as params to memcpy in this case?

-Anthony

By: Anthony LaMantia (alamantia) 2006-08-21 11:23:25

please try the patch i just sumbmited,
for anything really meaningfull we will require a more detailed log of what is going on...

see vechers comment..

By: sbisker (sbisker) 2006-08-21 13:34:17

Ok.  I've installed the patch.  Could you throw me a bone for the debugging.  I can use gdb no problem, but could you give me the commands for the exact info you want?

Thanks.

Edit:  I manually added the lines to skinny.c since the patch did not work.



By: Jason Parker (jparker) 2006-08-21 13:36:54

Anthony, I believe your patch is backwards.

Generally, it's best to modify the code in a working copy, then simply do `svn diff channels/chan_skinny.c`

By: Anthony LaMantia (alamantia) 2006-08-21 13:53:32

oops, here is the correct one

By: Serge Vecher (serge-v) 2006-08-21 14:07:10

alamantia: http://www.asterisk.org/developers/Patch_Howto wants to be your friend ...

By: Jason Parker (jparker) 2006-08-23 01:33:14

Scott, can I get a 'bt full' from that core?

By: sbisker (sbisker) 2006-08-23 07:41:04

Uploaded bt_full.txt as requested.

By: Jason Parker (jparker) 2006-08-23 10:32:24

Can you enable "don't optimize" in menuselect, and see if you can get another core?

By: sbisker (sbisker) 2006-08-28 10:03:57

Uploaded backtrace of unoptimized build.

By: sbisker (sbisker) 2006-09-05 13:43:38

Any idea on this one?

By: Jason Parker (jparker) 2006-09-05 13:47:54

This is on my list of things to do - I'll try to get to it tonight.

By: Anthony LaMantia (alamantia) 2006-09-06 15:50:16

i have uploaded a patch that should help you with your issue.

By: Jason Parker (jparker) 2006-09-26 17:05:09

Scott, did this patch help your crashes?

By: sbisker (sbisker) 2006-09-28 07:41:55

I uploaded a new backtrace.  It just happened yesterday.

By: Anthony LaMantia (alamantia) 2006-09-28 10:53:26

bt_full-2.txt is not in a format i can read :)

By: sbisker (sbisker) 2006-09-28 11:01:01

New file.  First one was Unicode

By: Anthony LaMantia (alamantia) 2006-09-28 15:54:37

i think you forgot to copy entry #0 in that backtrace where the fault is happening.

By: sbisker (sbisker) 2006-09-29 07:28:06

I suck.  bt_full4.txt is complete.

By: Anthony LaMantia (alamantia) 2006-09-29 10:23:40

was this with the patch attached to this mantis entry (skinny_update.diff) applied?

By: sbisker (sbisker) 2006-09-29 10:50:26

Yes, however, it looks like the original patch from correct.diff was also applied.  I have backed out that patch and have re-installed.  I will let it run for a week or so to see if it crashes.

By: Jason Parker (jparker) 2006-09-29 11:04:00

I was actually able to reproduce this when I ran it in gdb on an unoptimized build.  Looks like a stack issue.

By: Anthony LaMantia (alamantia) 2006-09-29 12:38:25

there seem to be quite a few issues with the memory managment inside of chan_skinny that seem causing much of the problem and a few others, i am reworking much of this code today, when i am finished i will post a patch that i would like you to test. if you are willing.

-anthony

By: Anthony LaMantia (alamantia) 2006-10-03 10:25:23

i put up a slight patch for the issue at hand, it needs to be tested a bit but it should work for your problem.. if you can test it out it would be great.



By: sbisker (sbisker) 2006-10-03 12:05:21

I applied the patch against 44249.  The 7920 phone now shows up as a 7960, and when I try to call the phone I get the following:

[Oct  3 12:57:02] NOTICE[16708]: chan_skinny.c:4355 skinny_request:  Asked to get a channel of unsupported format '0'
[Oct  3 12:57:02] WARNING[16708]: app_dial.c:1076 dial_exec_full:  Unable to create channel of type 'Skinny' (cause 0 - Unknown)

If I run against r36380 with the previous patch, calls work as usual.

By: sbisker (sbisker) 2006-10-06 08:59:57

OK.  Got the patch working.  Was a problem with chan_iax2 downgrading the codec to something that chan_skinny didn't support.  Patch is running.  I will advise if it crashes.  Usually takes 3-4 days since only 1 7920 phone is on the system.

I also have applied the skinny unregister/crash patch from bug 008067 and have it running against r44565.



By: sbisker (sbisker) 2006-10-12 09:32:19

We have been successfully running for about a week with no crashes.  I've added a second phone to the test system to see how the stability is.  If that works, I will be adding an additional 10 7920 phones, and go from there.

By: Anthony LaMantia (alamantia) 2006-10-15 15:05:44

sbisker that sounds great,
which one of the patches are you using right now?

as always keep us posted

-anthony

By: sbisker (sbisker) 2006-10-17 16:08:08

I am using skinny-patch.diff from bug 7774 and patch.diff from bug 8067 against SVN 44565.  I'm guessing bug 8067 has been incorporated into SVN already.

I am waiting for my 10 phones to show up.

By: Anthony LaMantia (alamantia) 2006-10-18 10:01:43

I am waiting for my 10 phones to show up. ?

do you mean you are waiting on 10 physical phones, or they are not showing up in asterisk ;)

By: Anthony LaMantia (alamantia) 2006-10-18 15:25:59

ah, i see your eariler posts mentioning the phones. keep us updated.
:)

By: Anthony LaMantia (alamantia) 2006-11-08 16:36:16.000-0600

sbisker any updates i would like to get this patch commited rather soon.

By: sbisker (sbisker) 2006-11-09 07:44:42.000-0600

I've been running for just about a month with no errors, crashes, etc.  I'm still waiting for my phones, they are on backorder.  My vote would be to commit the patch.  I never had more than 4-5 days of uptime with one rarely used phone on any previous version.  I now have 2 phones with a higher level of use, with no issues.

By: Damien Wedhorn (wedhorn) 2006-11-27 03:59:15.000-0600

I am mucking around with shared lines at the moment and came across an intermittent segfault. Looks like I managed to get around it, but after looking at this bug, seems that it is the same issue.

As far as patching, I think you only have to change the memcpy (about line 1350) as indicated in the last patch file (10-03-06). I really don't see how the rest of the patch will help unless it was addressing another issue.

Playing around with the debugging, it seems the issue was the memcpy was trying to copy 784 bytes, and the upper region was returning "cannot access memory". The data was only 4 bytes long. My conclusion is that the memcpy was copying the 4 bytes plus 780 bytes of other crap, the devices just ignored the other crap, and whenever we got near a memory boundary, it'd segfault.

So I suggest just fix up the memcpy at about line 1350 as per the 10-03 patch.

By: Anthony LaMantia (alamantia) 2006-11-29 16:00:14.000-0600

i have made a branch for this bug and the fix:

view:
http://svn.digium.com/view/asterisk/team/anthonyl/7774-branch

checkout:
svn co http://svn.digium.com/svn/asterisk/team/anthonyl/7774-branch

By: Anthony LaMantia (alamantia) 2006-12-01 16:33:19.000-0600

i'm putting this in qwell's queue for commital.

By: Anthony LaMantia (alamantia) 2006-12-04 14:08:33.000-0600

i've updated my branch on this issue to support length checking of the len variable before calling wirte or memcpy() for security/stability reasons.

this has also been done inside my skinny-redux branch.

By: Serge Vecher (serge-v) 2006-12-04 14:13:43.000-0600

anthony: small spelling error in r48239 ("length") at

"Transmit: the legnth of the request is out of bounds"

By: Anthony LaMantia (alamantia) 2006-12-08 11:57:17.000-0600

the copy in the 7774 branch has the correct spelling.

http://svn.digium.com/view/asterisk/team/anthonyl/7774-branch/channels/chan_skinny.c?view=markup



By: sbisker (sbisker) 2006-12-19 07:31:25.000-0600

Are these latest changes committed in 1.4.0beta4 ?

By: Anthony LaMantia (alamantia) 2006-12-19 11:25:47.000-0600

sbisker, i don't think so.

By: Anthony LaMantia (alamantia) 2006-12-19 19:20:59.000-0600

I will give qwell a poke tommrow so we can get this in for the next release.

By: Jason Parker (jparker) 2006-12-22 14:44:18.000-0600

Fixed in svn 1.4 and trunk revisions 48870 and 48871