Summary: | ASTERISK-25295: res_pjsip crash - pjsip_uri_get_uri at /usr/include/pjsip/sip_uri.h | ||
Reporter: | Dmitriy Serov (Demon) | Labels: | |
Date Opened: | 2015-07-30 07:50:35 | Date Closed: | 2015-09-09 05:55:35 |
Priority: | Major | Regression? | No |
Status: | Closed/Complete | Components: | Resources/res_pjsip |
Versions: | 13.4.0 | Frequency of Occurrence | Constant |
Related Issues: | |||
Environment: | Attachments: | ( 0) 2015_08_10__20_38_07.backtrace-threads.txt ( 1) 2015_08_10__20_38_07.full.tail.txt ( 2) 2015_08_10__20_58_07.backtrace-threads.txt ( 3) 2015_08_10__20_58_07.full.tail.txt ( 4) 2015_09_05__15_12_07.backtrace-threads.txt ( 5) 2015_09_05__15_12_07.full.tail.txt ( 6) backtrace.2015-07-30-1.txt ( 7) backtrace.2015-07-30-2.txt ( 8) backtrace.2015-07-30-3.txt ( 9) backtrace.2015-07-30-5.txt (10) core.back-trace.txt (11) debuglog.txt (12) full.2015-07-30-1.txt (13) full.2015-07-30-2.txt (14) full.2015-07-30-3.txt (15) full.2015-07-30-5.txt | |
Description: | Using last git branch 13. Now is 13.5-rc.
Tired of fighting with deadlock when using chan_sip. Decided to convert all the devices and the gateways to res_pjsip. The result was crash very often. When asterisk boots. Sometimes asterisk successfully loaded and crash in a few minutes. In the configs, nothing has changed except transfer chan_sip devices to res_pjsip. | ||
Comments: | By: Asterisk Team (asteriskteam) 2015-07-30 07:50:36.230-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Dmitriy Serov (Demon) 2015-07-30 07:54:57.256-0500 backtraces and full debug logs (level 8) for 4 crashes. By: Rusty Newton (rnewton) 2015-07-31 09:26:05.950-0500 I see so many optimized out values. Is there anyway you can recompile with "DONT_OPTIMIZE" and "BETTER_BACKTRACES" compiler flags? We generally want those on for any backtrace. DONT_OPTIMIZE really won't affect speed that much in a production instance. By: Rusty Newton (rnewton) 2015-07-31 09:27:25.196-0500 In addition if you can get a debug log that includes "pjsip set logger on" output so we can see what SIP packets were read right before the crash that would be helpful. By: Dmitriy Serov (Demon) 2015-08-10 06:57:25.902-0500 crash after 4 hours from start with PJSIP devices (instead of chan_sip). By: Dmitriy Serov (Demon) 2015-08-10 13:14:35.473-0500 I already accumulated a lot of files with the logs and backtraces. Crashes every 20 minutes to 2 hours. I guess I'll have to go back to chan_sip. By: Rusty Newton (rnewton) 2015-08-10 19:22:35.477-0500 Sorry you are having such a hard time with 13 between chan_sip and res_pjsip. Hopefully we'll be able to identify the source of these crashes with the latest data you provided. By: Rusty Newton (rnewton) 2015-08-18 14:52:41.913-0500 [~Demon] can you provide the exact PJPROJECT version being used, along with your OS environment information? Thanks! By: Mark Michelson (mmichelson) 2015-08-18 18:00:32.592-0500 It looks like there are a couple of different types of crashes that are occurring here. # A crash occurs when attempting to send a response to a REGISTER request with a strange Contact URI. This is what I see before the crash occurs: {noformat}[2015-07-30 12:18:20] VERBOSE[22319] res_pjsip_registrar.c: Added contact 'sip:wiseua@195.69.202.35:31382;rinstance=e770d0cd5972231a;X-PUSH-URI=' to AOR 'wiseua' with expiration of 7200 seconds{noformat} That URI has a stray '=' at the end of it. I think what is happening is that the attempt to parse the URI is failing, and we are ignoring the failure. This leads to creation of a Contact header with a NULL URI, which leads to the crash seen here. When I try to reproduce this crash using SIPp, I can't make it happen. Instead, I see this: {noformat}Added contact 'sip:wiseua@127.0.0.1:5061;rinstance=e770d0cd5972231a;X-PUSH-URI' to AOR 'sipp' with expiration of 3600 seconds{noformat} Notice how in my version, the stray '=' at the end is removed. However, my SIPp scenario has the '=' at the end of the Contact URI. Another similar crash has this: {noformat}[2015-08-10 20:36:18] VERBOSE[7386] res_pjsip_registrar.c: Added contact 'sip:16923@195.16.110.63:51677;app-id=s.notify.live.net;pn-type=wp;pn-tok= S' to AOR '16923' with expiration of 7200 seconds {noformat}. When I try to put this URI into a SIPp scenario, this fails to even reach the registrar because of a detected syntax error in PJSIP. So this means either that the version of PJSIP that you are using is different than the one I am using, and it therefore parses the URI differently, or it means that the Contact URI format is not what I was expecting it to be, and so I am not properly reproducing the problem. In either case, I believe the correct action to take here is to ensure that we do not ignore the return value when attempting to parse contact URIs. # A second crash appears to be due to attempting to use a task serializer that has already been destroyed. This appears to happen due to a SIP subscription being destroyed, then attempting to handle a 200 OK to an outgoing NOTIFY that we sent. There appears to be a reference counting issue here, so this is going to be more difficult to diagnose and fix. By: Dmitriy Serov (Demon) 2015-08-19 01:10:05.109-0500 cat /usr/lib/pkgconfig/libpjproject.pc {noformat} Name: libpjproject Description: Multimedia communication library URL: http://www.pjsip.org Version: 2.4 Libs: -L${libdir} -lpjsua2 -lstdc++ -lpjsua -lpjsip-ua -lpjsip-simple -lpjsip -lpjmedia-codec -lpjmedia -lpjmedia-videodev -lpjmedia-audiodev -lpjmedia -lpjnath -lpjlib-util -lilbccodec -lg7221codec -lsrtp -lgsm -lspeex -lspeexdsp -lpj -lssl -lcrypto -ldl -lz -luuid -lm -lrt -lpthread Cflags: -I${includedir} -I/usr/include -DPJ_AUTOCONF=1 -O2 -DNDEBUG -DPJ_IS_BIG_ENDIAN=0 -DPJ_IS_LITTLE_ENDIAN=1 -fPIC {noformat} At different times I have used both 2.4 and trunk. But these logs were obtained using 2.4. PJProject trunk was not compiled without Video at the time. {noformat} uname -a Linux talk37 3.13.6 #4 SMP Wed Aug 20 17:52:10 MSK 2014 x86_64 Intel(R) Core(TM) i5-2310 CPU @ 2.90GHz GenuineIntel GNU/Linux {noformat} By: Mark Michelson (mmichelson) 2015-08-19 09:25:15.442-0500 Thank you very much for the detailed answer. I suspect that there is something else about the URI that causes it to be mis-parsed. I'll try some more experiments and will report back when I have some sort of fix. By: Mark Michelson (mmichelson) 2015-08-19 16:04:17.757-0500 I have been unable to "trick" PJSIP into being able to accept a URI that looks like either of the ones that I quoted earlier. Unfortunately, none of the logs from when the registrar crash occurred have PJSIP logging enabled, so I cannot see what the REGISTER packet that caused the crash looked like. I am still looking into the second crash to try to find out how to fix it. By: Dmitriy Serov (Demon) 2015-08-20 04:19:59.237-0500 chan_sip: sip show peer 16923: Useragent : LinphoneWP8KeepAlive/ (belle-sip/1.3.2) Reg. Contact : sip:16923@195.16.110.23:29230;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/hk2/H2QAAADXKbJW5qha12tAYboqAjIAKWDWJ4KaBWmMhMLehaBrOeFgFqEFfsl18pRPQ-cGpykPK-N0PszqaXcfZzqJLqFnEqHFSg-NhfE0cOmS6KSTq7UxS41Q90fLyQxeWsI_Fog/d2luZG Length important? :) Sorry, can't provide a dump of REGISTER packet, because the device is offline. If I catch the moment when the owner it will connect, then... By: Mark Michelson (mmichelson) 2015-08-20 09:40:35.714-0500 That seemed to work properly for me: {noformat} Added contact 'sip:16923@127.0.0.1:5061;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/hk2/H2QAAADXKbJW5qha12tAYboqAjIAKWDWJ4KaBWmMhMLehaBrOeFgFqEFfsl18pRPQ-cGpykPK-N0PszqaXcfZzqJLqFnEqHFSg-NhfE0cOmS6KSTq7UxS41Q90fLyQxeWsI_Fog/d2luZG' to AOR 'sipp' with expiration of 7200 seconds {noformat} By: Kevin Harwell (kharwell) 2015-08-27 17:03:51.884-0500 I've been looking into this issue some and like Mark have been unable to reproduce the REGISTER problem. Does the contact uri for the following case actually contain the "=" at the end after "X-PUSH-URI"? {noformat} [2015-07-30 12:18:20] VERBOSE[22319] res_pjsip_registrar.c: Added contact 'sip:wiseua@195.69.202.35:31382;rinstance=e770d0cd5972231a;X-PUSH-URI= {noformat} I ask because when copying that line into another editor right after the "=" there is also a non printable character (you can also see it in the verbose log output when looking at the file). If those are added characters then it's likely that a buffer is not being properly NULL terminated or as Mark mentioned the buffer is being used improperly after a failure has occurred. [~Demon] any luck on getting a REGISTER packet? It would hopefully show us if there is anything odd in the incoming message. Also you asked: {quote} Length important? {quote} It looks like the max length for the contact URI is 256 By: Dmitriy Serov (Demon) 2015-08-28 07:19:34.931-0500 bq. Does the contact uri for the following case actually contain the "=" at the end after "X-PUSH-URI"? Yes. I don't have REGISTER packet dump (user is offline), but "sip show peer 16923" (now is chan_sip) displays: {noformat} Reg. Contact : sip:16923@87.228.126.146:61276;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/hk2/H2QAAADXKbJW5qha12tAYboqAjIAKWDWJ4KaBWmMhMLehaBrOeFgFqEFfsl18pRPQ-cGpykPK-N0PszqaXcfZzqJLqFnEqHFSg-NhfE0cOmS6KSTq7UxS41Q90fLyQxeWsI_Fog/d2lu {noformat} in logs i found such packet: {noformat} REGISTER sip:talk37.ru;transport=udp SIP/2.0 Via: SIP/2.0/UDP 192.168.1.66:50985;rport;branch=z9hG4bKPjmQXgOIC3W. Max-Forwards: 70 From: "ivmaster" <sip:ivmaster@talk37.ru>;tag=a0ZwQBBkHDataga30 To: "ivmaster" <sip:ivmaster@talk37.ru> Call-ID: dAe1tzVT1G CSeq: 18018 REGISTER User-Agent: Join 3.2.4(10958) Contact: "ivmaster" <sip:ivmaster@192.168.1.66:50985>;reg-id=1;+sip.instance="<urn:uuid:C427A9B9-ECA3-8383-4AE6-BA41B88A180F>" Expires: 900 Allow: PRACK, INVITE, ACK, BYE, CANCEL, UPDATE, INFO, SUBSCRIBE, NOTIFY, REFER, MESSAGE, OPTIONS, MESSAGE Content-Length: 0 {noformat} The users of LinphoneWP8KeepAlive/ (belle-sip/1.3.2) makes asterisk (chan_sip) to generate such packet: {noformat} OPTIONS sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0ilZ SIP/2.0 Via: SIP/2.0/UDP 85.142.148.80:5060;branch=z9hG4bK5eff2789;rport Max-Forwards: 70 From: "asterisk" <sip:asterisk@talk37.ru>;tag=as249c56dc To: <sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0il Contact: <sip:asterisk@85.142.148.80:5060> Call-ID: 61d6e12e28403b324a4af91e5a697de2@talk37.ru CSeq: 102 OPTIONS User-Agent: ruVoIP.net PBX Date: Fri, 28 Aug 2015 10:35:34 GMT Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE Supported: replaces, timer Content-Length: 0 {noformat} Users of winphone quite rare. I turned on sip debug and waiting "REGISTER" packet from. Questions: 1. What version of pjproject better now to use? 2. What configure params to use? I could make another attempt to switch devices on PJSIP with debug enabled and a full dump of the packets. By: Dmitriy Serov (Demon) 2015-08-28 08:01:32.098-0500 By the way, make attension to field "to" compare with request line: To: <sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0il OPTIONS sip:bondima@109.188.124.22;app-id=s.notify.live.net;pn-type=wp;pn-tok=/u/1/sin/H2QAAABQox0UYTJcaRqfCbZudHMkHygU_K-GcGBrFvf89_n6chf18lP4JeQHUKX-5yPDO49eNoA260qgDtBfGFAIXvnrJVLJ_G_7uzbBTGUlQHSGnImJFsAn-i3KYxyp8pF6LHg/d2luZG93c3Bob25lZGVmYXVsdA/ZhRmmVyMm0ilZ SIP/2.0 255 chars. Char "Z" disappeared and there is no ">" in "To" line. Chan_sip logs :) By: Kevin Harwell (kharwell) 2015-08-28 11:45:02.699-0500 {quote} Questions: 1. What version of pjproject better now to use? {quote} I'd recommend using the latest version of [pjproject|http://www.pjsip.org/] if possible, which is [currently|http://www.pjsip.org/download.htm] [2.4.5|http://www.pjsip.org/release/2.4.5/pjproject-2.4.5.tar.bz2]. {quote} 2. What configure params to use? {quote} See the following wiki page on how to configure it for Asterisk: [Building and Installing pjproject|https://wiki.asterisk.org/wiki/display/AST/Building+and+Installing+pjproject]. Depending on your system you might need to enable/disable certain features. Also, as a side note I was wrong earlier about the length of a contact. Apparently, there is another constraint when writing to the ast_db upon adding a contact: {noformat} db.c:318 ast_db_put: Family and key length must be less than 253 bytes {noformat} By: Joshua C. Colp (jcolp) 2015-08-29 10:45:59.054-0500 A fix for the second crash issue is now up for review. By: Dmitriy Serov (Demon) 2015-09-05 14:51:20.757-0500 new backtrace and debug logs with SIP debug. {noformat} REGISTER sip:talk37.ru;transport=UDP SIP/2.0 Via: SIP/2.0/UDP 100.104.148.1:27797;branch=z9hG4bK-d8754z-6cdfafec6cf606bc-1---d8754z-;rport Max-Forwards: 70 Contact: <sip:velikin@100.104.148.1:27797;rinstance=f676c97c42c7beec;transport=UDP;X-PUSH-URI=http://s.notify.live.net/u/1/db3/H2QAAADfWyZhFZDO0gf0n6K3sFKAvWRESxB7f0R0w8U4-dbUr-pxPenfrZ12qybyPFbi9LTtk-IYx3z3dyhTifqDI_CVcjK5GxllwyNcRkxN_mNAZoZIUNQ79EGVr_zvisatB_k/d2luZG93c3Bob25lZGVmYXVsdA/EW_BnIu3fUOH7FePoWYHNw/wQmdoDSphqYQ57yXZVXIQ7nLVW4> To: <sip:velikin@talk37.ru;transport=UDP> {noformat} By: Joshua C. Colp (jcolp) 2015-09-08 07:47:32.246-0500 Fixes are up at https://gerrit.asterisk.org/1212 and https://gerrit.asterisk.org/1213 for the Asterisk side of this but you will still need to change PJSIP to allow a larger URI size, as it defaults to 256. This can be done by modifying pjsip/include/pjsip/sip_config.h and changing PJSIP_MAX_URL_SIZE to be 1024. After doing so PJSIP *and* Asterisk has to be rebuilt. I've confirmed that doing this and having the above fixes allows things to work as expected. |