|Summary:||ASTERISK-21447: Asterisk crashes while connecting to TCP peers|
|Reporter:||Zohair Raza (zuhairraza)||Labels:|
|Date Opened:||2013-04-16 00:20:29||Date Closed:||2013-06-03 19:09:03|
|Status:||Closed/Complete||Components:||. I did not set the category correctly.|
|Environment:||CentOS linux 2.6.32-279.19.1.el6.x86_64||Attachments:||( 0) astlogs.txt|
( 1) astsipsett.txt
( 2) backtrace3.txt
( 3) backtrace-threads3.txt
( 4) core-show-locks3.txt
( 5) gdb.txt
I ran in to this problem a couple of times, I have asterisk running in both TCP and UDP modes but the peers are configured as TCP. For some reason (maybe packet loss in the network) asterisk gets out of sync with phones and tries to make a connection with a port at which phone is not registered anymore.
The strange behavior is, after some time asterisk gets killed. I am not sure how to overcome this. What I am thinking is to somehow force asterisk to not connect with that peer if for eg X number of tries fail. I tried disabling qualify and originating session timers by asterisk but the connections were not being stopped.
I am attaching backtraces and asterisk log before the time of restarts.
|Comments:||By: Michael L. Young (elguero) 2013-04-16 08:27:12.102-0500|
Have you tried a newer version of Asterisk? That version is quite old and especially since I see this in the changelog for Asterisk 1.8.20 --> AST-2012-015: Resolve crashes due to large stack allocations when using TCP (http://downloads.asterisk.org/pub/telephony/asterisk/ChangeLog-1.8-current).
By: Zohair Raza (zuhairraza) 2013-04-16 10:23:01.573-0500
One important thing I forgot to mention is, I am using this patch for cisco phones
I will try updating asterisk and see if it helps on another box, but it will be great if someone can find out what exactly the problem in the code is.
Because I think on the new box, I will not be able to reproduce the issue.
The patch is for 126.96.36.199, is there a short way to convert it for the current 1.8 version or I will have to manually add lines?
By: Michael L. Young (elguero) 2013-04-16 11:11:49.075-0500
Zohair, that is what I am trying to point out to you... what I think the issue might be. If you can't reproduce the issue with the latest version of Asterisk 1.8, that means the issue has been fixed and we we don't have to worry about it :)
In regards to your patch question, you can try applying the patch and see if it succeeds. If not, you can ask the author of the patch to post an updated patch to the issue.
By: Zohair Raza (zuhairraza) 2013-04-16 11:29:04.802-0500
I understand, but I think once I know that I will be able to reproduce the issue on latest version.
As of now, I put the phones that have those errors back to udp mode and I haven't seen any error yet. I left one phone on tcp to see if problem starts but not started yet, the system is up for last 10 hours.
I will try updating anyway but what if it happens in future (this makes me worry) as you would also have noticed that problem comes in when higher authorities are on call :P and we are in a trouble :)
By: Zohair Raza (zuhairraza) 2013-04-18 11:46:39.391-0500
I tried to simulate on another system (clone of the first) but asterisk didn't crash, maybe because it only has two phones.
but I still see those messages
2013-04-18 12:11:27] ERROR: tcptls.c:446 ast_tcptls_client_start: Unable to connect SIP socket to 188.8.131.52:5062: Connection refused
Later I installed asterisk 11 current release but having same messages. When I disable qualify and restart asterisk then these messages dont come every minute(qualifyfreq) but only a restart do the trick and reload does not.
However when it starts up, in the logs I see asterisk tries to connect on the previous ports which phone was connected from
[2013-04-18 12:34:47] ERROR: tcptls.c:446 ast_tcptls_client_start: Unable to connect SIP socket to 184.108.40.206:5075: Connection refused
[2013-04-18 12:34:47] ERROR: tcptls.c:446 ast_tcptls_client_start: Unable to connect SIP socket to 220.127.116.11:5066: Connection refused
Is it obvious? I think it should forget about the peer in some time lets say 2 minutes?
By: Michael L. Young (elguero) 2013-04-18 12:18:54.639-0500
Zohair, not crashing is a good thing. There have been memory leak fixes and buffer overflow fixes put into the latest code. There have also been bug fixes in the TCP/TLS code as well. Look at the changelog and the upgrade files for potential configuration changes. To me, it sounds like a configuration issue. The message is stating that whatever is at 18.104.22.168, 22.214.171.124 and 126.96.36.199 is refusing the connection. You need to find out why it is refusing the connection. Check that firewalls are not blocking the connections.
By: Zohair Raza (zuhairraza) 2013-04-21 03:50:19.867-0500
It may be a configuration issue, those IPs belong to phones and they don't have any firewall. In the trace, Ip:port is the location where phone was last online but somehow the sync failed and asterisk keeps trying to use old port of the phone which according to the phone is not used anymore. Phone had a TCP connection at a new port instead by this time.
By: Michael L. Young (elguero) 2013-04-24 13:13:05.561-0500
Zohair, Have you been able to find out any more information?
What would be good to know/find out is why the phone is changing the port and Asterisk doesn't know about it? Was Asterisk informed of the change and didn't change it? Or did the phone change the port without sending that change to Asterisk?
Perhaps a PCAP can help plus a full debug log.
By: Rusty Newton (rnewton) 2013-05-17 17:19:30.755-0500
@Zohair, is the crash still occurring? Can you provide the requested debug?
By: Zohair Raza (zuhairraza) 2013-05-24 10:44:27.095-0500
Sorry for late response
I updated to asterisk 11.3, and it didn't crash after that
What I found is that the phone is not behaving well with asterisk, those messages still comes but asterisk doesn't crash. I am not sure why asterisk keep trying to connect with the peer and why it just not forget about it lets say after 5 minutes or some time.
I disabled keepalive and qualify but didn't help
By: Rusty Newton (rnewton) 2013-06-03 19:02:27.758-0500
Since there is no crashing after the upgrade I'm going to close this out. If you think there is an issue with the rapid connections you described then you may want to ask others on asterisk-users and include a pastebin. If you talk it over with others and think its a bug then you can open a separate issue on the tracker.
If you do open a new issue, please attach a PCAP and Asterisk debug log for it as we asked for above.