Summary:ASTERISK-06018: Zaptel PRI lines dead after incoming AGI call terminates early
Reporter:Steve Hanselman (shanselman)Labels:
Date Opened:2006-01-09 02:47:37.000-0600Date Closed:2006-01-18 09:49:21.000-0600
Versions:Frequency of
Environment:Attachments:( 0) asteriskbt.txt
( 1) asterisklog.txt
Description:On a number of occasions since the 16th December we have had asterisk effectively lock up.  The PRI circuits drop, you cannot connect with asterisk -r, but asterisk itself continues to run.

I have checked on the wiki and when this happens again I will debug the running asterisk.

Asterisk with a TE4xx 1st generation, a number of Cisco 79xx phones running sccp with the sccp2 channel (latest version 1217).  A number of SIP trunks to call managers at our customer sites.

The latest event happened at some point over the weekend when there would have been little to no load on the system.

The restarts of the PRI channels stopped during this time as well, as did the responses to the manager Ping command issued by Nagios.


We do see these in the log:
Jan  8 08:44:08 DEBUG[13440] channel.c: Avoiding initial deadlock for 'Zap/6-1'
Jan  9 09:01:04 DEBUG[24635] channel.c: Avoiding initial deadlock for 'Zap/32-1'
Jan  9 09:01:04 DEBUG[24635] channel.c: Avoiding initial deadlock for 'Zap/32-1'
Jan  9 09:02:20 DEBUG[24635] channel.c: Avoiding initial deadlock for 'Zap/33-1'
Jan  9 09:19:48 DEBUG[24635] channel.c: Avoiding initial deadlock for 'Zap/1-1'

But now having looked at the source I can see that these are in a retry loop, so I'll ignore those.
Comments:By: Steve Hanselman (shanselman) 2006-01-09 02:52:13.000-0600

Ignore the last line in the description, this is bogus, I was looking at the wrong log, it DID continue to attempt to restart the PRI circuits and respond to the Ping command.

By: Steve Hanselman (shanselman) 2006-01-09 03:01:47.000-0600

Looks like one of the threads fell over, bt attached (I'll check how it was built, not sure it wasn't optimised... and also have a look at manager.c to see what it as doing at that point).

By: Steve Hanselman (shanselman) 2006-01-09 03:17:47.000-0600

Having seen the time of the core dump, it looks as though somebody was dialed into an AGI application, and hung up before it completed, this then caused asterisk to go into some kind of multiple restart loop (at this point it was being run from the console, not by safe_asterisk, so I'm not sure as to why it would attempt to restart itself?)

Anyhow, relevant portion of the log is also attached.

By: Steve Hanselman (shanselman) 2006-01-09 03:46:06.000-0600

Just realised that I've probably made that core dump completely invalid, it was generated on the SVN version given above, but the gdb was against the very latest SVN (7875), checked out just after I opened this mantis entry :(

By: Steve Hanselman (shanselman) 2006-01-09 05:11:15.000-0600

Need to amend the description on this, it's the AGI that causes it to die, if the call is hung up during the AGI.

I'll move this onto a test system and see what I can find out.

By: Steve Hanselman (shanselman) 2006-01-09 06:10:41.000-0600

This is reproducible using the weather.ago script from the wiki.
It only seems to be an issue if the call is made on a Zaptel line (possibily more explicitly a PRI?)

Can somebody amend the description to be "Zaptel PRI lines dead after incoming AGI call terminates early"

The version of Perl::AGI is the latest (0.08)

By: Matt O'Gorman (mogorman) 2006-01-13 16:13:47.000-0600

shanselman so this issue goes away if you arent using agi?

By: Steve Hanselman (shanselman) 2006-01-16 02:42:13.000-0600

Certainly seems to, I've now gone back through the logs, and each time it's happened, somebody has been dialling in on a zap channel obtaining TAF and METAR information (basically the weather.agi script).

By: Steve Hanselman (shanselman) 2006-01-16 06:53:38.000-0600

As of r8090 I can't reproduce this, I'll have to rebuild back to an earlier version of out hours and see if this is a fluke or whether something has changed that has resolved this.

I could reproduce this 100% of the time previously, I've just tried 6 times and each time asterisk has survived.

By: Matt O'Gorman (mogorman) 2006-01-18 09:49:12.000-0600

shanselman im going to mark this closed for now.  if you are able to reproduce it please provide the agi script and more info so that we can try to figure out what is happening.