Summary:ASTERISK-00647: chan_sip hangs/deadlocks on retransmits
Reporter:axelm (axelm)Labels:
Date Opened:2003-12-10 04:47:00.000-0600Date Closed:2004-09-25 02:49:15
Versions:Frequency of
Description:chan_sip seems to hang after packet retransmit count has been exceeded. gdb output follows (thread 5 seems to be the culprit):

(gdb) info threads
 12 Thread 5062667 (LWP 6407)  0x400ffae2 in sigsuspend () from /lib/libc.so.6
 11 Thread 4898826 (LWP 6205)  0x40196b1e in select () from /lib/libc.so.6
 10 Thread 131081 (LWP 32630)  0x40196b1e in select () from /lib/libc.so.6
 9 Thread 114696 (LWP 32628)  0x40172f11 in nanosleep () from /lib/libc.so.6
 8 Thread 98311 (LWP 32627)  0x40172f11 in nanosleep () from /lib/libc.so.6
 7 Thread 81926 (LWP 32626)  0x40196b1e in select () from /lib/libc.so.6
 6 Thread 65541 (LWP 32625)  0x40196b1e in select () from /lib/libc.so.6
 5 Thread 49156 (LWP 32624)  0x400ffae2 in sigsuspend () from /lib/libc.so.6
 4 Thread 32771 (LWP 32623)  0x40196b1e in select () from /lib/libc.so.6
 3 Thread 16386 (LWP 32620)  0x40196b1e in select () from /lib/libc.so.6
 2 Thread 32769 (LWP 32618)  0x401952c0 in poll () from /lib/libc.so.6
* 1 Thread 16384 (LWP 32617)  0x40196b1e in select () from /lib/libc.so.6
(gdb) thread 5
[Switching to thread 5 (Thread 49156 (LWP 32624))]#0  0x400ffae2 in sigsuspend
   () from /lib/libc.so.6
(gdb) bt
#0  0x400ffae2 in sigsuspend () from /lib/libc.so.6
#1  0x40022f35 in __pthread_wait_for_restart_signal ()
  from /lib/libpthread.so.0
#2  0x40024790 in __pthread_alt_lock () from /lib/libpthread.so.0
#3  0x40021984 in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0x4032b0b4 in retrans_pkt (data=0x811b258) at chan_sip.c:451
ASTERISK-1  0x08051c85 in ast_sched_runq (con=0x80d6f18) at sched.c:355
ASTERISK-2  0x4033c97f in do_monitor (data=0x0) at chan_sip.c:5334
ASTERISK-3  0x40020d53 in pthread_start_thread () from /lib/libpthread.so.0


This happens regularly on our box (Debian Linux 2.4.23 SMP, Dell 1650, 2 CPUs).

snippet from chan_sip.c:

       } else {
               ast_log(LOG_WARNING, "Maximum retries exceeded on call %s for se
qno %d (%s)\n", pkt->owner->callid, pkt->seqno, pkt->resp ? "Response" : "Reques
               pkt->retransid = -1;
               while(pkt->owner->owner && ast_mutex_lock(&pkt->owner->owner->lo
ck)) {
               if (pkt->owner->owner) {
                       /* XXX Potential deadlocK?? XXX */
                       ast_queue_hangup(pkt->owner->owner, 0);
               } else {
                       /* If no owner, destroy now */
                       pkt->owner->needdestroy = 1;
Comments:By: Brian West (bkw918) 2003-12-10 10:07:17.000-0600

Running lastest CVS?

By: Mark Spencer (markster) 2003-12-10 16:20:53.000-0600

I believe this has already been fixed, but confirmation that it cannot be duplicated anymore would be helpful.

By: axelm (axelm) 2003-12-11 12:03:48.000-0600

tried to upgrade today, ran into different behaviour with todays CVS version: asterisk now fails to play sounds. echotest application e.g. itself works fine after the upgrade, but none of the .gsm files plays (yes, it loads gsm modules/format/codec, yes, it opens the files, no, no rtp packets are being sent). Any suggestions how to debug this?

By: Brian West (bkw918) 2003-12-11 14:54:30.000-0600

from the description it sounds like you have grandstream phones.  If not what end points are you using?

By: axelm (axelm) 2003-12-12 03:40:41.000-0600

Does not depend on end points. Tried with Cisco 7960, X-Lite, Cisco AS5300. As i wrote above, _only_ announcements are not played. No RTP packets are sent from Asterisk during those announcements, looks like a decoding problem with the files.

otoh, applications itself seems to work. As i wrote above, the Echotest _itself_ (the application) works fine. RTP packets being sent, myself hearing me. Same Endpoints, same Network, same installation.

So, a probably more precise question: How can i debug if Asterisk fails to decode and sending media during playbacks?

By: Brian West (bkw918) 2003-12-14 11:32:06.000-0600

sounds like you have other issues.. i'm running latest CVS with my 7960 and it works fine.  Can you show me an example of what you are trying to do?  Are you answering the channel first?

By: axelm (axelm) 2003-12-19 05:07:52.000-0600

just updated to todays CVS - works now.