[Home]

Summary:ASTERISK-09826: [patch] crash when decoding callerid
Reporter:paradise (paradise)Labels:
Date Opened:2007-07-06 16:23:02Date Closed:2007-08-02 13:16:12
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 10141.patch
( 1) cid-crash.txt
Description:this crash happens rarely 1-2 times a week.


****** ADDITIONAL INFORMATION ******

BT is attached.
Comments:By: Steve Murphy (murf) 2007-07-11 18:13:50

I just ran thru the code. To have large negative numbers for mylen (like -66250), something is seriously wrong! Especially when len = 160. Either it's gone thru so much data, that numbers are overflowing, or .... ?

If the numbers are not overflowing, then it looks like memory corruption to me.

By: Steve Murphy (murf) 2007-07-12 09:39:18

OK, after reviewing, then reviewing again, I think I have it. The samples var in the serie routine seems to say that it covered 66,413 items in that first loop, which is inconsistent with the number of samples in a group, which is usually like 160 or so, from the numbers in your stack trace.

IF this routine gets called with a 0 len, it will fetch a sample anyway, and that should put len to -1. the test in the loop is for 0 only, so, it would merrily churn thru unallocated memory. Apparently, it churned thru over 66K iterations before it stepped outside a memory page and segfaulted.

Hmmm. I didn't try to study why it would get called with 0 elements in buffer, and I could be totally wrong. I attached a patch that changed the tests to look for less-than-or-equal-to 0 instead of just equal-to 0, and we will soon see if this makes any difference.

Please apply 10141.patch to your source, and run again. If all goes well, it won't crash at all. If it still crashes, the bt full trace should tell me what the len was when the call was made.

Remind me that the patch I made is not optimal, if it works!  instead of testing for 0 or less than 0 in the loop, really, the code shouldn't even get a sample if there are none there. I'll commit a better fix if this patch works.

By: Steve Murphy (murf) 2007-07-26 10:31:59

hey, paradise--- have you had a chance to test this patch? Does it work? I hope you don't feel like I'm pressing you to test this for me.... because I am! ;)


By: paradise (paradise) 2007-07-28 04:56:27

yes, i applied your patch and it's working.
now i'm just waiting for a crash.

By: Steve Murphy (murf) 2007-08-02 12:43:18

5 days and no crash? That's a good sign.

By: Steve Murphy (murf) 2007-08-02 13:16:11

Yes, I just read my note to myself saying that this patch isn't optimal,
but 1.2 support has ended, it appears to work, so I committed it as
77942.

I think I'll just go ahead and commit this to 1.4 and trunk in the same format,
lest I forget. If I try to eliminate why this routine would get called with 0 samples, I may end up introducing a bug.

1.4 via r. 77945;
trunk via r. 77946