Summary: | ASTERISK-25439: Segfault in find_entry () from /usr/lib/libpj.so.2 (dns_resolver, qualify_contact) | ||||||
Reporter: | Dmitriy Serov (Demon) | Labels: | |||||
Date Opened: | 2015-10-01 04:08:26 | Date Closed: | 2018-10-09 17:08:01 | ||||
Priority: | Major | Regression? | No | ||||
Status: | Closed/Complete | Components: | Resources/res_pjsip | ||||
Versions: | 13.5.0 | Frequency of Occurrence | Frequent | ||||
Related Issues: |
| ||||||
Environment: | # Package Information for pkg-config Name: libpjproject Description: Multimedia communication library URL: http://www.pjsip.org Version: 2.4.5 Libs: -L${libdir} -lpjsua2 -lstdc++ -lpjsua -lpjsip-ua -lpjsip-simple -lpjsip -lpjmedia-codec -lpjmedia -lpjmedia-videodev -lpjmedia-audiodev -lpjmedia -lpjnath -lpjlib-util -lilbccodec -lg7221codec -lsrtp -lgsm -lspeex -lspeexdsp -lpj -lssl -lcrypto -luuid -lm -lrt -lpthread Cflags: -I${includedir} -I/usr/include -DPJ_AUTOCONF=1 -O2 -DNDEBUG -DPJ_IS_BIG_ENDIAN=0 -DPJ_IS_LITTLE_ENDIAN=1 -fPIC | Attachments: | ( 0) 2015_09_30__20_02_07.backtrace-threads.txt ( 1) 2015_09_30__20_02_07.full.tail.txt ( 2) 2015_10_01__11_50_08.backtrace-threads.txt ( 3) 2015_10_01__11_50_08.full.tail.txt ( 4) 2015_10_01__13_14_07.backtrace-threads.txt ( 5) 2015_10_01__13_14_07.full.tail.txt ( 6) 2016_01_10__13_08_08.backtrace-threads.txt ( 7) 2016_01_10__13_08_08.full.tail.txt ( 8) 2016_01_10__22_20_01.backtrace-threads.txt ( 9) 2016_01_10__22_20_01.full.tail.txt (10) 2016_01_10__22_20_01.locks.txt (11) 2016_01_11__22_56_01.full.tail.txt (12) 2016_01_11__22_56_01.locks.txt (13) 2016_01_12__00_04_07.backtrace-threads.txt (14) 2016_01_12__13_41_01.locks.txt (15) 2016_01_12__13_42_01.full.tail.txt (16) 2016_01_12__13_42_01.locks.txt (17) 2016_01_12__15_43_01.full.tail.txt (18) 2016_01_12__15_43_01.locks.txt (19) 2016_01_13__20_18_07.backtrace-threads.txt (20) 2016_01_13__20_18_07.full.tail.txt (21) 2016_02_11__05_56_08.backtrace-threads.txt | ||||
Description: | Segfault in find_entry.
pjproject: 2.4.5 PJSIP_MAX_URL_SIZE modified to 1024 | ||||||
Comments: | By: Asterisk Team (asteriskteam) 2015-10-01 04:08:28.655-0500 Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report. Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process]. By: Dmitriy Serov (Demon) 2015-10-01 04:09:37.115-0500 backtraces and log tails of two cases are attached By: Dmitriy Serov (Demon) 2015-10-01 06:14:38.305-0500 The same segfault stack, but judging by the log there were problems with the Internet. I am using local bind daemon with forwarders { 8.8.8.8; 8.8.4.4; } By: Dmitriy Serov (Demon) 2016-01-10 08:01:32.699-0600 Asterisk SegFaults every day :( Today logs attached. By: Dmitriy Serov (Demon) 2016-01-10 13:37:26.842-0600 It seems (some tests) that SegFault is the result of deadlock (file: 2016_01_10__22_20_01.locks.txt) 2016_01_10__22_20_01.backtrace-threads.txt tail of log: 2016_01_10__22_20_01.full.tail.txt May be the reason is: [2016-01-10 22:18:28] ERROR[367] netsock2.c: getaddrinfo("sip.ukrtel.net", "(null)", ...): System error [2016-01-10 22:18:29] ERROR[367] netsock2.c: getaddrinfo("sip.ukrtel.net", "(null)", ...): System error By: Dmitriy Serov (Demon) 2016-01-11 15:30:23.902-0600 Ones again :( 2016_01_12__00_04_07.backtrace-threads.txt In this case there are no locks. Just segmentation faults. By: Dmitriy Serov (Demon) 2016-01-11 16:04:58.525-0600 Other regular trouble (may be it relatives). Asterisk periodical hangs: 2016_01_11__22_56_01.locks.txt In locks: __ast_bt_get_addresses res_pjsip/pjsip_options.c:391 qualify_contact() in 2016_01_11__22_56_01.full.tail.txt before [2016-01-11 22:55:47] contacts created and deleted after [2016-01-11 22:55:47] contact only deleted asterisk hangs and one minute after was killed -9 By: George Joseph (gjoseph) 2016-01-11 17:26:42.349-0600 Possible cause... pjproject defines a maximum hostname size of 128 (which is way too short in my opinion). If it's passed a hostname longer than that that, it segfaults. Can you try compiling pjproject with PJ_MAX_HOSTNAME set to something very large and see if the issue still happens? I'm going to put a check in Asterisk anyway to prevent us from sending the hostname if it's too long. By: Dmitriy Serov (Demon) 2016-01-12 03:21:23.420-0600 pjproject configured with ./configure --prefix=/usr --enable-shared --disable-sound --disable-resample --disable-video --disable-opencore-amr --with-external-speex \ --with-external-srtp=/usr/src/programs/srtp --with-external-gsm CFLAGS="-O2 -DNDEBUG -DPJSIP_MAX_URL_SIZE=1024 -DPJ_MAX_HOSTNAME=1024" asterisk was rebuilded and restarted. Monitoring is going on :) If segfault will repeat i comment immediately. For me it is very doubtful that the length is exactly host exceeded 128. Is there an example of such a host from the log? In any case I'll be glad if this edit will help. By: George Joseph (gjoseph) 2016-01-12 09:35:05.766-0600 The length was just something I ran across while looking at pjproject and thought it might be worth a try. Still looking. By: George Joseph (gjoseph) 2016-01-12 10:46:07.235-0600 Do you have DETECT_DEADLOCKS turned on when you compile asterisk? If not, can you try it? Maybe it'll give us more info. By: Dmitriy Serov (Demon) 2016-01-12 15:08:24.117-0600 More file of locks. Two hangs in 13:41 and 15:43 By: Dmitriy Serov (Demon) 2016-01-13 12:44:54.887-0600 SegFaults in the same place. PJProject-2.4.5 ./configure --prefix=/usr --enable-shared --disable-sound --disable-resample --disable-video --disable-opencore-amr --with-external-speex \ --with-external-srtp=/usr/src/programs/srtp --with-external-gsm CFLAGS="-O2 -DNDEBUG -DPJSIP_MAX_URL_SIZE=1024 -DPJ_MAX_HOSTNAME=1024" 2016_01_13__20_18_07.backtrace-threads.txt 2016_01_13__20_18_07.full.tail.txt By: Dmitriy Serov (Demon) 2016-01-22 12:43:42.282-0600 I guess ASTERISK-25638 is related. By: Dmitriy Serov (Demon) 2016-02-04 03:20:58.825-0600 Disabling "response cache" in "pjproject-2.4.5/pjlib-util/src/pjlib-util/resolver.c" completely eliminated the problem. I'm sure this cache in pjproject and it ref_cnt is not working in a multithreaded environment with a little more load. By: Dmitriy Serov (Demon) 2016-02-12 01:20:41.932-0600 2016_02_11__05_56_08.backtrace-threads.txt Backtrace of same find_entry of hash. But another stack. I guess this indicates the presence of multithreading issues in the code of the hash. By: Sean Bright (seanbright) 2018-09-17 17:24:46.046-0500 [There have been many changes to {{resolver.c}} since 2.4.5 was released|https://trac.pjsip.org/repos/changeset?reponame=&new=5826%40pjproject%2Ftrunk%2Fpjlib-util%2Fsrc%2Fpjlib-util%2Fresolver.c&old=4649%40pjproject%2Ftrunk%2Fpjlib-util%2Fsrc%2Fpjlib-util%2Fresolver.c]. Is this still reproducible with the latest Asterisk 13 release with bundled PJSIP? By: Dmitriy Serov (Demon) 2018-10-09 17:02:41.907-0500 I use 15.6.1. Segfaults with resolver are not seen. I think issue can be closed. By: Richard Mudgett (rmudgett) 2018-10-09 17:08:01.317-0500 Closed per reporter. Reporter no longer using Asterisk 13 versions. |