[Home]

Summary:ASTERISK-09649: Crash while updating hints
Reporter:Tim Donahue (tdonahue)Labels:
Date Opened:2007-06-11 13:59:19Date Closed:2007-06-27 16:16:46
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 9946.patch.txt
( 1) Backtrace-2007-06-11.txt
( 2) Backtrace-2007-06-12-1.txt
( 3) Backtrace-2007-06-12-2.txt
( 4) chan_sip.c.patch
( 5) pbx.c.patch
Description:I am experiencing a random crash on my Asterisk 1.4 server.  The server is currently running a SVN checkout of the 1.4 branch as requested by jsmith on Friday 6/8 instead of 1.4.4.

This crash seems to be the same crash I was experiencing while running 1.4.4.  The last lines in the log are:

[Jun 11 10:06:47] VERBOSE[16558] logger.c:  Extension Changed 4506 new state Ringing for Notify User PCS4502
[Jun 11 10:06:47] VERBOSE[16558] logger.c:  Extension Changed 4506 new state Ringing for Notify User PCS4514

There are currently 3 receptionist phones that monitor this extension.  

****** ADDITIONAL INFORMATION ******

Server: Debian Etch i386
Kernel: 2.6.18-4-686
Comments:By: Eliel Sardanons (eliel) 2007-06-12 00:04:50

hmm, p->context and p->exten being overwriten by p->from?
[p->context + p->exten] 0x3d656e69 + 0x6c3b3439 =>> '94;line='

By: Tim Donahue (tdonahue) 2007-06-12 12:53:17

Uploaded additional backtraces from crashes this morning.

By: Eliel Sardanons (eliel) 2007-06-14 15:57:07

Please try this patch, but I think the problem is in another place. context is null and the strcmp() segfaults. The first bt shows garbage in p->context and p->exten addresses, in the last bt, it show p->context and p->exten null.

By: Joshua C. Colp (jcolp) 2007-06-18 12:47:37

The patch supplied will probably mask the issue... I'm more curious over why context and exten are going kaboom... how many subscriptions do you have?

By: Eliel Sardanons (eliel) 2007-06-18 12:52:22

You are right, this patch was created just to avoid the crash. And will not solve the real problem.

By: Russell Bryant (russell) 2007-06-18 17:34:49

I have uploaded another patch which I hope will fix this issue.

When the device state callback is called by the device state change thread, there was no locking being done to ensure that the SIP pvt structure did not disappear while handling the state change.

I'd like you to try it before I commit it.  Even if it doesn't fix this issue, it is certainly still a bug in this code.

By: Eliel Sardanons (eliel) 2007-06-18 18:50:39

chan_sip.c.patch (for svn trunk), I'm testing your code in a production enviroment, i will give you some feedback.

By: Russell Bryant (russell) 2007-06-19 10:27:14

I am feeling really optimistic this morning and I went ahead and committed this patch since I think it is going to fix this problem.  However, please reopen this if it is not fixed.  The commit was done in 1.4 and trunk in revisions 69945 and 69944.  Thanks!

By: Tim Donahue (tdonahue) 2007-06-20 14:25:34

After applying the patch to my asterisk code and installing it this morning there seems to be some fallout from patch.  Immediately following bringing the system up after the install this morning the `core show hints` command worked fine.  As of this afternoon approximiately 10 hours after bringing the system up the command is locking and not working.  No commands on the command line work other than exit, but all start working once you re-enter the asterisk command line.

The output from the command is as follows:
PCScale*CLI>
   -= Registered Asterisk Dial Plan Hints =-
PCScale*CLI>

At which point both the commands stop working and the events stop scrolling on the command line.

I have also noticed that the "Extension Changed 4506 new state Ringing for Notify User PCS4502" messages are not showing up at all on the console as phone states are changing.

Tim Donahue

By: Tim Donahue (tdonahue) 2007-06-20 15:05:53

Per russell's request will update to the latest revision in the 1.4 branch and enable DEBUG_THREADS to find the source of the deadlock.

By: Russell Bryant (russell) 2007-06-27 08:13:52

... anything happening here?  I will have to close this bug if you can't reproduce the problem and help be get the needed debug information.

By: Russell Bryant (russell) 2007-06-27 16:16:45

Feel free to reopen if you still have a problem.