Summary:ASTERISK-24800: Crash in __sip_reliable_xmit due to invalid thread ID being passed to pthread_kill
Reporter:JoshE (n8ideas)Labels:
Date Opened:2015-02-17 08:56:35.000-0600Date Closed:2015-02-24 16:15:39.000-0600
Versions:11.16.0 13.2.0 Frequency of
Environment:Attachments:( 0) ASTERISK-24800-13.diff
( 1) sipxmit_crash.txt
Description:Crash observed in __sip_reliable_xmit with a large number of realtime peers attached to the database.
Comments:By: JoshE (n8ideas) 2015-02-17 08:57:08.162-0600

Backtrace attached.

By: Matt Jordan (mjordan) 2015-02-17 11:00:34.982-0600

Asterisk crashed in a call to {{pthread_kill}}:

#0  0x00007fd8f1028740 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007fd8901e4266 in __sip_reliable_xmit (p=0x7fd8e06f3b58, seqno=102, resp=0, data=0x7fd8e07ff250, fatal=1, sipmethod=3) at chan_sip.c:4232
       pkt = 0x7fd8e06cb7a0
       siptimer_a = 1000
       xmitres = 582
       respid = 2418556244
       __PRETTY_FUNCTION__ = "__sip_reliable_xmit"

Which occurs here:
} else {
/* This is odd, but since the retrans timer starts at 500ms and the do_monitor thread
* only wakes up every 1000ms by default, we have to poke the thread here to make
* sure it successfully detects this must be retransmitted in less time than
* it usually sleeps for. Otherwise it might not retransmit this packet for 1000ms. */
if (monitor_thread != AST_PTHREADT_NULL) {
pthread_kill(monitor_thread, SIGURG);

Do you have the core file still? If so, what is the value of {{monitor_thread}} in frame 1?

By: JoshE (n8ideas) 2015-02-17 12:56:34.533-0600

Here's the value:

(gdb) frame 1
#1  0x00007fd8901e4266 in __sip_reliable_xmit (p=0x7fd8e06f3b58, seqno=102, resp=0, data=0x7fd8e07ff250, fatal=1, sipmethod=3)
   at chan_sip.c:4232
4232 pthread_kill(monitor_thread, SIGURG);
(gdb) print monitor_thread
$1 = 18446744073709551614

By: JoshE (n8ideas) 2015-02-17 12:57:03.337-0600

Value of frame added.

By: Matt Jordan (mjordan) 2015-02-17 13:38:20.244-0600

Which would be {{2^64 - 2}}. Interestingly, this is the value of {{AST_PTHREADT_STOP}}, which is not checked for in that code. Passing that as an actual pointer value to {{pthread_kill}} would be a *bad* thing.

By: Matt Jordan (mjordan) 2015-02-17 13:39:50.280-0600

Attached is a patch that should prevent this issue from happening. I suspect that it would be pretty rare to catch it, as you'd have to have Asterisk stopping the {{monitor_thread}} when a scheduled item trips, but it's worth trying.

By: Matt Jordan (mjordan) 2015-02-24 16:02:57.927-0600

So... I'm pretty sure I fixed this with this patch. As such, I'm going to go ahead and commit it, and close out this issue. If it turns out you experience this problem again with this patch, comment on the issue and I'll be happy to re-open it. Thanks!

By: JoshE (n8ideas) 2015-02-24 16:11:06.922-0600

I forgot to update, but I've had this running in production for ~1 week without seeing a recurrence or other unintended side effects.  I'd recommend committing.