|ASTERISK-03638: DNS Error prevents SIP module from functioning correctly.
|Perhaps I'm wrong and hopefully I am (and hopefully this isn't already posted somewhere here), but I receive a problem with asterisk when my primary dns server is down. While loading the module sip does not register remotely or attempt to try another dns server listed in /etc/resolve.conf. With that it kills the module and for me no sip functionality exists even after a few minutes of typing in reload. This includes sip phones on the local network. When attempting to dial out I receive an error message stating that the channel is "busy". It seems all you have to do to enable this bug is to have a bad nameserver on the first line of your resolv.conf file.
****** ADDITIONAL INFORMATION ******
1. Bad nameserver.
2. Hangs on "[chan_sip.so] => (Session Initiation Protocol (SIP))
== Parsing '/etc/asterisk/sip.conf': Found" for quite a while (a minute?).
3. Once reload is complete, sip features are completely unavailable for several minutes.
4. Notices such as the following occur:
*CLI> Mar 7 00:51:57 NOTICE: chan_sip.c:8405 sip_poke_noanswer: Peer 'broadvoice' is now UNREACHABLE! Last qualify: 0
5. Here is another notice:
Mar 7 00:52:45 NOTICE: chan_sip.c:4165 sip_reg_timeout: -- Registration for 'email@example.com@sip.broadvoice.com' timed out, trying again
6. The final and most confusing error:
Mar 7 00:53:01 WARNING: chan_sip.c:7212 handle_response: Got 200 OK on REGISTER that isn't a register
7. A possible remedy would be to mention in the debug to users that perhaps the dns server is the cause.
8. A better remedy would be to find whatever is blocking the sip functionality and recoding it so that it will try any and all dns servers in resolv.conf and not getting stuck on the first one.
|By: mitcheloc (mitcheloc) 2005-03-07 03:00:06.000-0600
As a side note, it's probably saying that it received a "200 OK" on a REGISTER thats not a register because of something blocking the sockets and causing the packets to be delayed in being processed in asterisk
By: Kevin P. Fleming (kpfleming) 2005-03-07 15:27:04.000-0600
This is not a major bug, please re-read the bug posting guidelines. Since you already posted that you have a reasonable workaround, at best this is a 'minor' bug.
Second, chan_sip (and all the rest of Asterisk) does not even know about your /etc/resolv.conf file at all. Asterisk uses the library provided by your OS/distro to do DNS lookups, which may use files, NSS, databases or other forms of host resolution. Whether that library uses backup DNS servers properly or not is completely out of Asterisk's control.
I agree that chan_sip should not die completely because of a DNS lookup failure at startup, so we'll leave the bug open until that can be addressed.
By: mitcheloc (mitcheloc) 2005-03-07 17:45:41.000-0600
I read the guidelines page, however I missed the part about a resonable workaround causing a major to become a minor. It seems to me that this does count as major in the other regards as it does completely prevent asterisk from operating in an expected manner.
In regards to the way it uses the library for dns lookups, is it possible that it is not implementing it correctly? It seems strange that the library would block indefinately. I don't know enough about linux though and c to make an educated guess. So I'll leave the rest of this issue up to everyone here.
Thank you for your fine work!
By: Joshua C. Colp (jcolp) 2005-03-08 17:39:46.000-0600
It's the same with any application that does DNS lookups in Linux unfortunately. For example, try pinging a non-existant domain you haven't tried before and see what happens - it'll block until the DNS lookup times out. We do have ideas for a non-blocking DNS resolver but as of yet, nothing concrete.
By: Brian West (bkw918) 2005-03-09 01:37:31.000-0600
Read it top to bottom and then you'll understand the resolver lib does this.
By: mitcheloc (mitcheloc) 2005-03-09 01:53:27.000-0600
The reopen button was the only one I could find so I could reply. I wanted to ask one last question.
How possible would it be to use a seperate thread (fork?) to attempt to resolve the names and then return the data back to the main thread so it can be processed? ...i.e. main thread checks some sort of shared data for ip addresses of hosts that have been resolved and then it processes them so that nothing is being blocked, the main SIP functionality affects every phone and call that comes in and out and it really should not be blocked because of this.
Perhaps I'm making it a bigger deal then it is, your welcome to make the last call on whether or not this is important.
By: Kevin P. Fleming (kpfleming) 2005-03-10 11:38:19.000-0600
We all talked about the possibility of doing this; the problem, just pushing the DNS resolution into a separate thread does not help at all, because the thread that wants the resolved address still cannot continue with what it was doing until it gets a result.
Making DNS processing totally asynchronous (where the processes waiting for DNS results go to sleep and wake up when results are available) would be one option, but that is an enormous amount of work and greatly increases the complexity of the code.
Essentially, what we are saying is that if you are going to use DNS to resolve critical information in your Asterisk configuration, you need to do everything possible to ensure that the DNS lookups will not block for long periods of time.