[Home]

Summary:ASTERISK-15022: Lockup in chan_sip
Reporter:Marie Fischer (fmarie)Labels:
Date Opened:2009-10-22 21:20:20Date Closed:2011-06-07 14:00:29
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Channels/chan_sip/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_sip_lock.txt
( 1) core_show_locks-20091025-0043.txt
( 2) log-20091024-0045.txt
( 3) log-20091025-0043.txt
( 4) sip_show_channels-20091024-0045.txt
Description:Hello.

We are using Asterisk 1.6.2.0-rc3 on openSUSE 11.1 mainly for call recording, fax and IVR. Asterisk is connected to a softswitch as a SIP peer, the softswitch is connected to the PSTN via SS7 and to MGCP and SIP phones.

Asterisk is locking up randomly. The console works, but no more SIP traffic is accepted. This started when we switched (because of fax problems) from 1.6.1.6 to 1.6.2.0-rc2. At first, it happened about once a week but has progressed to once-twice per day. No improvement after upgrading to 1.6.2.0-rc3. Restarting Asterisk via /etc/init.d/asterisk script helps (until the next lockup).

The last message in the log (set to verbose) before the lockup is something like
chan_sip.c: Maximum retries exceeded on transmission 23150-AQ-002ecd38-506161c24@localdomain.com for seqno 2554278 (Critical Response) -- See doc/sip-retransmit.txt.
However, we have also seen this message without a lockup following.

I attached the output of "core show locks". I will also attach sip debug output as soon as I get it. What else should I do to help debug this problem?

Also, until a better solution, is there a way to monitor Asterisk for this kind of lockup and restart it?

Thanks,

Marie Fischer
Comments:By: Elazar Broad (ebroad) 2009-10-23 02:44:18

Are you by any chance using session timers(session-timers in sip.conf)?



By: Marie Fischer (fmarie) 2009-10-23 10:12:41

No, we have not configured session timers, so the defaults should be used.

By: Elazar Broad (ebroad) 2009-10-23 10:42:53

Can you see if you can replicate this issue with session-timers=refuse? Thanks!

By: Marie Fischer (fmarie) 2009-10-23 10:57:45

I added session-timers=refuse, will update on the results.

By: Manuel Wenger (manuel_wenger) 2009-10-23 11:29:00

We have the exact same problem on a productive system running 1.6.1.6, only SIP channels, with session-timers=refuse. The only command that still works somewhat is "core show channels": it shows the first 5-10 channels, fails to show the rest, and never shows the final lines (active channels/active calls/calls processed). Existing calls stay up, but no new calls are accepted.

Unfortunately I don't have a "core show locks" to attach so far. Being a productive system, support personnel restarted the server as quickly as possible, but I'll try to get one.

We have about 2000 registered peers, of which 300 have qualify=10000, and we use a DAHDI dummy timer source. Peak is 50 simultaneous calls. We use res_odbc connected to mysql and reload the configuration every 5 minutes with "dialplan reload" and "sip reload". The lock doesn't happen at the precise moment when we reload, but randomly (on average once every 2-7 days).



By: Marie Fischer (fmarie) 2009-10-24 21:49:16

I have been able to replicate the issue twice with session-timers=refuse.

I am attaching 2 sip logs from 24.10.2009 00:45 and 25.10.2009 00:43, as well as "sip show channels" output from 24.10.2009 00:45 and "core show locks" output from 25.10.2009 00:43.

From the SIP logs, it looks like the remote peer (cirpack) is sending repeated CANCEL requests, to which Asterisk does not react. A bit later, Asterisk sends 183 Session Progress and 486 Busy Here/503 Service Unavailable, to which it gets no answer, retransmits and locks up.
There is also a warning "chan_sip.c: Unable to cancel schedule ID xxxxx.  This is probably a bug (chan_sip.c: send_response, line 3997)." in both cases.

It seems that when the lockup occurs, "sip show channels" and "sip show peers" work fine, but "sip reload" and "module reload" return immediately without any output or anything happening.

I also checked the logs from before the upgrade to 1.6.2.0-rc2 and we have had the "Maximum retries exceeded on transmission" warnings on both 1.6.1.1 and 1.6.1.6 without chan_sip locking. The "Unable to cancel schedule" warnings appeared only after upgrade to 1.6.2.0.