Summary: | ASTERISK-24212: testsuite: Sporadic crash due to assert on stopping RTP engine | ||||
Reporter: | Matt Jordan (mjordan) | Labels: | |||
Date Opened: | 2014-08-12 09:36:58 | Date Closed: | 2014-09-02 13:18:13 | ||
Priority: | Major | Regression? | |||
Status: | Closed/Complete | Components: | Channels/chan_pjsip Resources/res_rtp_asterisk Tests/testsuite | ||
Versions: | Frequency of Occurrence | ||||
Related Issues: |
| ||||
Environment: | Attachments: | ( 0) backtrace_12706.txt ( 1) full.txt | |||
Description: | Periodically, when stopping an RTP instance, an assertion is triggered by the scheduler:
{noformat} #4 0x00000000006c6761 in _ast_assert (con=0x20885f0, id=4, file=0x7fc5de39cbb1 "res_rtp_asterisk.c", line=4590, function=0x7fc5de39edf3 "ast_rtp_stop") at /srv/bamboo/xml-data/build-dir/AST-ATSF4-C664TE/asterisk/include/asterisk/utils.h:810 No locals. #5 _ast_sched_del (con=0x20885f0, id=4, file=0x7fc5de39cbb1 "res_rtp_asterisk.c", line=4590, function=0x7fc5de39edf3 "ast_rtp_stop") at sched.c:489 buf = "s != NULL, id=4\000\322\002\000\000\001\000\000\000\060V\004\004\306\177\000\000\020V\004\004\306\177\000\000`\363\337\351\305\177\000\000xF\000\020\306\177\000\000P\364\337\351\305\177\000\000\330\004:\353\061\063\000\000\000\365\337\351\305\177\000\000\377\000\000\000\000\000\000\000\227\367\000\020\306\177\000\000\367\365\337", <incomplete sequence \351> s = 0x0 tmp = {list = {next = 0x0}, id = 4, when = {tv_sec = 0, tv_usec = 0}, resched = 0, variable = 0, data = 0x0, callback = 0, __heap_index = 0} last_id = 0x7fc6100b9390 __PRETTY_FUNCTION__ = "_ast_sched_del" #6 0x00007fc5de38e815 in ast_rtp_stop (instance=0x7fc604045ab8) at res_rtp_asterisk.c:4590 rtp = 0x7fc60404a510 addr = {ss = {ss_family = 0, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, len = 0} __PRETTY_FUNCTION__ = "ast_rtp_stop" #7 0x000000000068e7ec in ast_rtp_instance_stop (instance=0x7fc604045ab8) at rtp_engine.c:1037 No locals. #8 0x00007fc5d8eb3bcf in stream_destroy (session_media=0x7fc610045568) at res_pjsip_sdp_rtp.c:1170 No locals. #9 0x00007fc5eb39cc41 in session_media_dtor (obj=<value optimized out>) at res_pjsip_session.c:1038 session_media = 0x7fc610045568 {noformat} Somehow, we appear to have a valid RTCP scheduler ID, but when deleting it we don't have anything in your scheduler context corresponding to it. | ||||
Comments: | By: Mark Michelson (mmichelson) 2014-08-21 18:31:00.501-0500 The cause of this is actually pretty simple. The session happens to be destroyed at the same time that a scheduled RTCP transmission is occurring. Since the scheduled RTCP transmission is currently not in the scheduler heap, the scheduler can't delete it. In the testsuite, since DO_CRASH is enabled, the triggered assertion results in a crash and a test failure. Fixing this will be interesting. A good start would be to have the scheduler context track which task it is currently running so that that may be detected when attempting to delete a scheduler entry. At least with that, we can detect the circumstance and not fail an assertion. What we then do when detecting that situation is a different story. I think the easiest thing to do would be to mark the scheduler entry in such a way that it does not enter back into the heap and return successful deletion of the scheduled entry. The locking in place in the scheduler should prevent race conditions, and the party that is deleting the scheduler entry should presumably also be unreffing/deleting the data that was attached to the scheduler entry. |