Summary:ASTERISK-08916: Asterisk crashes when a sip phone answers the Page
Reporter:Phil Smith (philsmith)Labels:
Date Opened:2007-03-01 13:55:49.000-0600Date Closed:2007-06-30 09:20:08
Versions:Frequency of
Environment:Attachments:( 0) backtrace_AsteriskSVN-branch-1.4-r58168_noSipAddHeader.txt
( 1) backtrace_AsteriskSVN-branch-1.4-r58168.txt
( 2) backtrace.1.4.1.txt
( 3) backtrace.txt
( 4) verbosedebug_with_sip_debug.txt
Description:Using the SipAddHeaders to include Alert-Info set to have Polycom phones do a Ring-Answer (RA) and then a PAGE will come out the phone speaker.  Once the person who made the page ends the call, Asterisk will close.  Must manually restart the Asterisk service.


Relevant config lines:

===== extensions.conf =====
exten => 998,1,SipAddHeader(Alert-Info:info=RA)
exten => 998,2,Page(SIP/431|q)

====== sip.conf =====

callerid="Info Services" <214>

callerid="Info Services" <431>
Comments:By: Serge Vecher (serge-v) 2007-03-01 14:10:26.000-0600

it is probably crashing. You need to produce a backtrace as per:
1. Enable DONT_OPTIMIZE flag in menuselect under Compiler Options.
2. Rebuild Asterisk
3. Start Asterisk with '-g' option
4. When crash occurs, perform 'bt' on the core file. Then 'thread apply all bt full'
b. Also see this for reference: http://www.voip-info.org/tiki-index.php?page=Asterisk%20debugging
c. Redo the verbosedebug.txt -> you've forgotten to enter "sip set debug"

By: Phil Smith (philsmith) 2007-03-01 14:22:25.000-0600

New debug with SIP debug enable this time uploaded.

By: Serge Vecher (serge-v) 2007-03-01 14:47:01.000-0600

don't forget the backtrace.

By: Phil Smith (philsmith) 2007-03-01 14:51:46.000-0600

Backtrace is uploaded now.  Sorry for the delay on that one, I had never done a backtrace before and it took my a while to do it, hopefully correctly.

By: Joshua C. Colp (jcolp) 2007-03-05 14:26:49.000-0600

The backtrace doesn't give much detail. Is it an unoptimized build? Can you also attach just the console output of an attempt? Also: I attempted to reproduce the issue and was unable to... quite odd.

By: Phil Smith (philsmith) 2007-03-05 14:59:13.000-0600

From the make menuconfig option 10. Complier Flags was selected.  The menu inside both DEBUG_THREADS and DONT_OPTIMIZE were enabled per the instructions on http://www.voip-info.org/tiki-index.php?page=Asterisk%20debugging

As far as duplication, I have tried it on our production server and then a freshly installed test server.  I got the same result from both machines.  Could it be something in our configs, which were the same between both servers?

Console results::
Really destroying SIP dialog '68b543a17f3763036d8522b904291455@' Method: INVITE
   -- Executing [998@iax-trunk:1] SIPAddHeader("SIP/214-09d75bb0", "Alert-Info:info=RA") in new stack
   -- Executing [998@iax-trunk:2] Page("SIP/214-09d75bb0", "SIP/431|q") in new stack
   -- Called 431
   -- Created MeetMe conference 1023 for conference '376671530d'
   -- SIP/431-09d903a8 is ringing
   -- SIP/431-09d903a8 answered
[Mar  5 15:51:31] WARNING[3964]: channel.c:1686 ast_hangup: Hard hangup called by thread -1213584464 on SIP/431-09d903a8, while fd is blocked by thread -1213830224 in procedure ast_waitfor_nandfds!  Expect a failure
 == Spawn extension (iax-trunk, 998, 2) exited non-zero on 'SIP/214-09d75bb0'
Disconnected from Asterisk server
Attempting to reconnect for 30 seconds

By: Serge Vecher (serge-v) 2007-03-05 15:03:43.000-0600

do you want to try a clean installation from 1.4.1 tarball?

By: Phil Smith (philsmith) 2007-03-05 15:06:18.000-0600

Sure I'm game!  Let me pull it down and I will let you know what I find.

By: Phil Smith (philsmith) 2007-03-05 15:47:22.000-0600

Got the same result with 1.4.1.  Upload a backtrace from 1.4.1 core dump.

Console output::
Connected to Asterisk 1.4.1 currently running on localhost (pid = 26033)
Verbosity is at least 3
localhost*CLI> core set debug 9
Core debug was 0 and is now 9
localhost*CLI> core set verbose 9
Verbosity was 3 and is now 9
   -- Executing [998@iax-trunk:1] SIPAddHeader("SIP/214-08de6568", "Alert-Info:info=RA") in new stack
   -- Executing [998@iax-trunk:2] Page("SIP/214-08de6568", "SIP/431|q") in new stack
   -- Called 431
   -- Created MeetMe conference 1023 for conference '346952275d'
   -- SIP/431-08dfa1e8 is ringing
   -- SIP/431-08dfa1e8 answered
[Mar  5 16:47:02] WARNING[26105]: channel.c:1686 ast_hangup: Hard hangup called by thread -1214227536 on SIP/431-08dfa1e8, while fd is blocked by thread -1214473296 in procedure ast_waitfor_nandfds!  Expect a failure
 == Spawn extension (iax-trunk, 998, 2) exited non-zero on 'SIP/214-08de6568'
Disconnected from Asterisk server
Attempting to reconnect for 30 seconds

By: Russell Bryant (russell) 2007-03-06 18:51:40.000-0600

A change was made in this code after 1.4.1 was released that could fix this problem.  Would you mind testing it again?  :)

By: Phil Smith (philsmith) 2007-03-07 08:09:10.000-0600

If the change was made after the release of 1.4.1 are you looking for the SVN build of the 1.4 branch or has the 1.4.1 tar ball now have this change in it?

Just need to know which one you want me to test out.

By: Serge Vecher (serge-v) 2007-03-07 09:58:42.000-0600

phil, it is the former (svn). Once released, tarballs are not modified.

By: Phil Smith (philsmith) 2007-03-07 13:37:58.000-0600

Got the same result with the SVN r58168 build.  Attached new backtrace for that core dump.

By: Serge Vecher (serge-v) 2007-03-07 13:42:43.000-0600

what happens if you do not use this line?
exten => 998,1,SipAddHeader(Alert-Info:info=RA)

By: Phil Smith (philsmith) 2007-03-07 14:02:00.000-0600

Without the SipAddHeader command the Polycom will not auto answer the call.  With this the remote phone rings and rings.  Manually answer the call on the remote then hang up the paging phone and the same crash happens.

   -- Executing [998@iax-trunk:1] Page("SIP/214-08d5b6d0", "SIP/431|q") in new stack
   -- Called 431
   -- Created MeetMe conference 1023 for conference '771295151d'
   -- SIP/431-08d48bc0 is ringing
   -- SIP/431-08d48bc0 answered
[Mar  7 15:00:01] WARNING[4210]: channel.c:1686 ast_hangup: Hard hangup called by thread -1213510736 on SIP/431-08d48bc0, while fd is blocked by thread -1213756496 in procedure ast_waitfor_nandfds!  Expect a failure
 == Spawn extension (iax-trunk, 998, 1) exited non-zero on 'SIP/214-08d5b6d0'
Disconnected from Asterisk server
Attempting to reconnect for 30 seconds
Failed to reconnect for 30 seconds.  Quitting.

I can get you the backtrace on that one if you want it.

By: Ken Williams (kenw) 2007-03-08 15:31:13.000-0600

I'm also having the same issue.  I have 3 asterisk servers, two are running 1.4.1, these are both crashing.  The 3rd is running 1.4.0, not crashing.  All are running very similar scripts, changed for extensions & iax connections only.  We're running Grandstream GXP-2000's with the SIP header to auto-answer for these phones.  One last note, the 2 running 1.4.1 were originally Trixbox's that I tried to remove the Trixbox from and replace everything with the latest versions of the software & my own scripts, not sure if it's Trixbox, 1.4.1, or something completely unrelated.  Here's pieces of my configurations & errors, hope it helps:

callerid=device <601>

exten => 660,1,SIPAddHeader(Call-Info: answer-after=0)
exten => 660,2,Page(SIP/601&SIP/602&SIP/603&SIP/604)
exten => 660,3,Hangup

[Mar  8 13:57:37] WARNING[3209]: channel.c:1686 ast_hangup: Hard hangup called by thread -1214538832 on SIP/604-0a1da2c8, while fd is blocked by thread -1215030352 in procedure ast_waitfor_nandfds!  Expect a failure
 == Spawn extension (ext-local, 660, 2) exited non-zero on 'IAX2/IAX-Price-Trunk-4'
Disconnected from Asterisk server

By: Serge Vecher (serge-v) 2007-03-09 08:45:42.000-0600

alright, so the crash has nothing to do with SipAddHeaders, but with the fact that the page is answered. Yes, please include a backtrace with SipAddHeaders removed and a page manually answered.

By: Phil Smith (philsmith) 2007-03-09 12:31:15.000-0600

New backtrace added with the SipAddHeaders removed from the config.

One line only
exten => 998,1,Page(SIP/431|q)

::New Information::
pager calls 998 and the phone rings at 431.  Once the pager hangs up Asterisk crashes.  If however exten 431 hangs up first then the pager hangs up Asterisk continues to work.  The only time it crashes is if the phone initiating the page hangs up first.

By: Ken Williams (kenw) 2007-03-09 12:33:15.000-0600

I upgraded my 1.4.0 box to 1.4.1 and the problem started occuring.  I downgraded back to 1.4.0, problem went away.  I downgraded my other 2 boxes from 1.4.1 to 1.4.0 and the problem went away.

By: Ken Williams (kenw) 2007-03-09 14:45:58.000-0600

It doesn't appear to be app_meetme, I swapped out 1.4.0 app_meetme.so into a 1.4.1 environment, same error.  I tried the same hack with chan_sip but 1.4.1 dumps immediately on startup with the 1.4.0 version of chan_sip.so.  I'll continue to see if I can narrow down the problem.

By: Serge Vecher (serge-v) 2007-03-12 09:24:15

kenw, et al: it is tedious work, but if you have an opportunity to test specific revisions, that will really help out to track this bug down. The way I approach such problems is to pick a revision right in the middle of the range between 1.4.0 and 1.4.1 release. If the problem occurs, then use the same principle to track the revision down in the first range, if not, then move on to the second range.

By: Ken Williams (kenw) 2007-03-12 09:52:04

I've searched ftp.digium.com & svn.digium.com and I can't seem to find where releases between 1.4.0 & 1.4.1 are.  If you could point me in the right direction I'll do what I can to narrow the problem.

By: Phil Smith (philsmith) 2007-03-12 13:54:25

KenW - here is how I do it, anyone correct me if I am wrong on this process.

Go to http://svn.digium.com/view/asterisk/branches/1.4/?rev=58833&sortby=file&view=log
This will give you a list of the revisions, what was changed and how long ago, etc.

Then pull down a selected version (example Revision 57318)
svn checkout -r 57318 http://svn.digium.com/svn/asterisk/branches/1.4 asterisk-1.4

Build this version and see if it crashes.  If it does go back a few more versions and keep trying until it does not crash.  Once you have a non-crashing build make a note of that version and start working forward until you find between what two revisions the crashes start happening.

Once you know between which two introduces the crash you can report it to the maintainers and they can start investigating what is different between those two version.

By: Ken Williams (kenw) 2007-03-12 19:33:17

51929 does not crash
52052 does crash

I'll try to see if I can find the difference between the two tomorrow.

Upon further testing, it crashes more in 52052, but crashes in versions back to at least 51411, probably earlier.  Will test more tomorrow.

(an hour later)

Never one to sleep...I've actually got the following error to display in 1.4.0 but without crashing the server:

[Mar 12 23:38:29] WARNING[20547]: channel.c:1599 ast_hangup: Hard hangup called by thread -1217602640 on Zap/pseudo-1165617763, while fd is blocked by thread -1215145040 in procedure ast_waitfor_nandfds!  Expect a failure  

I've gone back to 43852 and can make the server crash on it.  

Which build is 1.4.0 based off?

By: Serge Vecher (serge-v) 2007-03-13 08:29:26

1.4.0 was a tag for 1.4 branch rev 48923

By: Ken Williams (kenw) 2007-03-13 11:16:31

Okay, here's what I've been able to find out and I'm at a bit of a road block now, unsure where to go from here.

1.4.0 I can get the above mentioned error but it will not crash the server.

Every other build I've tried (I have about 15 builds right now I've tried, one pre-1.4.0 and the others post-1.4.0/pre-1.4.1) I can get the error & 75% of the time it crashes the server.  The below error seems to be a precursor to the ast_hangup error that dumps the server.

[Mar 13 10:11:23] WARNING[2892]: channel.c:1754 ast_waitfor_nandfds: Thread -1214637136 Blocking 'Zap/pseudo-1920944468', already blocked by thread -1214907472 in procedure ast_waitfor_nandfds

I've also noticed the odds of it crashing go up significantly if I'm paging more than one phone.  On most builds I can page one phone indefinitely and not get any errors, warnings or dumps.  As soon as I add a second phone to page, I start getting warnings, errors & crashes.

Let me know what you'd like me to test from here.

By: Serge Vecher (serge-v) 2007-03-13 11:47:27

well, you've tried all the tricks in my book. Let's see if the developers can get a glance into the problem from the backtraces.

By: Joshua C. Colp (jcolp) 2007-03-13 15:12:42

I've tried for the last while to make this happen and have been unable to. Can someone give me access to a machine where it is happening?

By: Ken Williams (kenw) 2007-03-13 15:38:14

Contact me off list and I'll give you access to a machine.  I assume you have access to my e-mail address.

By: Ken Williams (kenw) 2007-03-14 14:43:56

Well whatever the problem is, it must be something with either an interface to hardware or an interface to OS from Asterisk.  I renamed all my asterisk folders under /etc, /var/lib, /var/log, /var/run & /usr/lib, reinstalled Asterisk-1.4.1, created a very basic asterisk.conf, extensions.conf, modules.conf & sip.conf.  It dumped after first page with original error.


callerid=device <701>

repeat for 717 & 724


include => ext-local

exten => 761,1,SIPAddHeader(Call-Info: answer-after=0)
exten => 761,n,Page(SIP/717&SIP/724&SIP/701)
exten => 761,n,Hangup

include => ext-local
exten => s,1,Hangup
exten => h,1,Hangup

By: William Wong (skwong) 2007-03-15 04:08:22

Hi People, I found a way to solve this problem. I think the main reason of this error is the new implementation of the Page() function, if you use the old tarball release of 1.4.0 and get the apps/app_page.c to the new 1.4.1 version, it works. I already tried it on my own. I hope that helps.

By: Serge Vecher (serge-v) 2007-03-15 08:36:46

skwong: thanks for the pointer, this is not a solution, but a workaround however. We need to find what's wrong with the new implementation. Did anybody establish contact with file yet? Email him at jcolp at digium.com if not.

By: Ken Williams (kenw) 2007-03-15 10:03:21

I copied the tarball 1.4.0 app_page.c to my tarball 1.4.1 folder, recompiled and ran some tests.

It took about 15-20 pages, with 3 or 4 giving me the error but not dumping the system, before it finally did dump the system.

So it didn't fix it completely, but it definitely minimized how often it happens in 1.4.1.  Unfortunately a dump every 15-20 pages means a reboot every 4-5 hours in our environment.

By: William Wong (skwong) 2007-03-15 12:11:57

Well, I just tried like 20 times continuously using Page on 2 of my asterisk servers and they all works without any errors. I don't even have any crashes and don't even have an error message.

FYI, for each of my servers, I have 10 Polycom IP501 hookup, and they all work well. I just use the 1.4.0 tarball release and it works great on my side. If you want, email me and I can sent you the app_page.c I am currently using.

BTW, sorry about that, this is definitely not the solution for the new app_page.c function, I just workarround with it so that I can make it work.

By: Joshua C. Colp (jcolp) 2007-03-16 11:14:01

Please try 1.4 as of revision 58992 everyone!

By: Ken Williams (kenw) 2007-03-16 11:29:20

Tried it on test server, paged 20+ times, no error, no crash.  WOOT!

/* We have to wait for the async thread to exit as it's possible Meetme won't throw them out immediately */

Damn Meetme app not throwing them out :P

By: Serge Vecher (serge-v) 2007-03-16 12:01:22

thanks for reporting, kenw!

Phil: if the crash still occurs on your end, please reopen the bug with a new backtrace. Thanks!