Summary: | ASTERISK-24800: Crash in __sip_reliable_xmit due to invalid thread ID being passed to pthread_kill | ||
Reporter: | JoshE (n8ideas) | Labels: | |
Date Opened: | 2015-02-17 08:56:35.000-0600 | Date Closed: | 2015-02-24 16:15:39.000-0600 |
Priority: | Major | Regression? | |
Status: | Closed/Complete | Components: | Channels/chan_sip/General |
Versions: | 11.16.0 13.2.0 | Frequency of Occurrence | |
Related Issues: | |||
Environment: | Attachments: | ( 0) ASTERISK-24800-13.diff ( 1) sipxmit_crash.txt | |
Description: | Crash observed in __sip_reliable_xmit with a large number of realtime peers attached to the database. | ||
Comments: | By: JoshE (n8ideas) 2015-02-17 08:57:08.162-0600 Backtrace attached. By: Matt Jordan (mjordan) 2015-02-17 11:00:34.982-0600 Asterisk crashed in a call to {{pthread_kill}}: {noformat} #0 0x00007fd8f1028740 in pthread_kill () from /lib64/libpthread.so.0 No symbol table info available. #1 0x00007fd8901e4266 in __sip_reliable_xmit (p=0x7fd8e06f3b58, seqno=102, resp=0, data=0x7fd8e07ff250, fatal=1, sipmethod=3) at chan_sip.c:4232 pkt = 0x7fd8e06cb7a0 siptimer_a = 1000 xmitres = 582 respid = 2418556244 __PRETTY_FUNCTION__ = "__sip_reliable_xmit" {noformat} Which occurs here: {code} } else { /* This is odd, but since the retrans timer starts at 500ms and the do_monitor thread * only wakes up every 1000ms by default, we have to poke the thread here to make * sure it successfully detects this must be retransmitted in less time than * it usually sleeps for. Otherwise it might not retransmit this packet for 1000ms. */ if (monitor_thread != AST_PTHREADT_NULL) { pthread_kill(monitor_thread, SIGURG); } return AST_SUCCESS; } {code} Do you have the core file still? If so, what is the value of {{monitor_thread}} in frame 1? By: JoshE (n8ideas) 2015-02-17 12:56:34.533-0600 Here's the value: {noformat} (gdb) frame 1 #1 0x00007fd8901e4266 in __sip_reliable_xmit (p=0x7fd8e06f3b58, seqno=102, resp=0, data=0x7fd8e07ff250, fatal=1, sipmethod=3) at chan_sip.c:4232 4232 pthread_kill(monitor_thread, SIGURG); (gdb) print monitor_thread $1 = 18446744073709551614 (gdb) {noformat} By: JoshE (n8ideas) 2015-02-17 12:57:03.337-0600 Value of frame added. By: Matt Jordan (mjordan) 2015-02-17 13:38:20.244-0600 Which would be {{2^64 - 2}}. Interestingly, this is the value of {{AST_PTHREADT_STOP}}, which is not checked for in that code. Passing that as an actual pointer value to {{pthread_kill}} would be a *bad* thing. By: Matt Jordan (mjordan) 2015-02-17 13:39:50.280-0600 Attached is a patch that should prevent this issue from happening. I suspect that it would be pretty rare to catch it, as you'd have to have Asterisk stopping the {{monitor_thread}} when a scheduled item trips, but it's worth trying. By: Matt Jordan (mjordan) 2015-02-24 16:02:57.927-0600 So... I'm pretty sure I fixed this with this patch. As such, I'm going to go ahead and commit it, and close out this issue. If it turns out you experience this problem again with this patch, comment on the issue and I'll be happy to re-open it. Thanks! By: JoshE (n8ideas) 2015-02-24 16:11:06.922-0600 I forgot to update, but I've had this running in production for ~1 week without seeing a recurrence or other unintended side effects. I'd recommend committing. |