Summary: | ASTERISK-02880: endless loop due to ast_search_dns() taking too long | ||
Reporter: | gkempke (gkempke) | Labels: | |
Date Opened: | 2004-11-24 10:08:37.000-0600 | Date Closed: | 2011-06-07 14:00:19 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ( 0) dns_diff ( 1) dns_diff.txt | |
Description: | I have several "register" lines in my sip.conf. each contains a host to be looked up during transmit_register(). On my system (Linux SuSE 9.1, PII 233MHz) ast_search_dns() takes about 10 seconds. Because the retransmit timeout for a register is 20 seconds, by the time all three registers have been sent, the first timeout strikes. This leads to an endless loop in ast_sched_runq(), because for the new registers ast_search_dns() needs to be called again. ****** ADDITIONAL INFORMATION ****** Suggestion (don't know if correct but works here): --- dns.c 2004-06-22 22:11:15.000000000 +0200 +++ ../../asterisk/asterisk/dns.c 2004-11-24 17:50:51.863916752 +0100 @@ -169,15 +169,19 @@ #endif char answer[MAX_SIZE]; int res, ret = -1; + static int is_init = 0; #ifdef HAS_RES_NINIT - res_ninit(&dnsstate); + if (!is_init) + res_ninit(&dnsstate); res = res_nsearch(&dnsstate, dname, class, type, answer, sizeof(answer)); #else ast_mutex_lock(&res_lock); - res_init(); + if (!is_init) + res_init(); res = res_search(dname, class, type, answer, sizeof(answer)); #endif + is_init = 1; if (res > 0) { if ((res = dns_parse_answer(context, class, type, answer, res, callback)) < 0) { ast_log(LOG_WARNING, "Parse error\n"); @@ -190,12 +194,7 @@ else ret = 1; } -#ifdef HAS_RES_NINIT - res_nclose(&dnsstate); -#else -#ifndef __APPLE__ - res_close(); -#endif +#ifndef HAS_RES_NINIT ast_mutex_unlock(&res_lock); #endif return ret; | ||
Comments: | By: Brian West (bkw918) 2004-11-24 10:15:54.000-0600 Actually you're barking up the WRONG tree here... dns.c isn't used in this case. I think if you disable SRV lookups you might solve the problem. But we use ast_gethostbyname and unless you find us a non-blocking dns resolver lib this can't really be fixed. bkw edited on: 11-24-04 17:05 By: Brian West (bkw918) 2004-11-24 10:20:55.000-0600 Yep turn off srv lookup's if you have them on thats the ONLY place where ast_search_dns would EVER be called on a register. If its blocking during the ast_gethostbyname then maybe you need a faster box or better DNS server. Granted this whole bloking on gethostbyname has been known for a long time we just dont have a free asynchronous resolver lib that we can use so we have to live with it or write one from scratch. By: gkempke (gkempke) 2004-11-24 10:25:59.000-0600 The problem is not the time it takes to look up the host... The problem is the time it takes to initialize and that is done everytime ast_search_dns is called. Why not leave the resolver initialized for subsequent calls (as I have done now)? By: Brian West (bkw918) 2004-11-24 10:27:30.000-0600 But the only time this code would exec is if you have srvlookup on. Otherwise the code in question would be ast_gethostbyname. And attach your diff please. bkw By: gkempke (gkempke) 2004-11-24 10:53:48.000-0600 srvlookup was enabled by default (make samples). I've attached a diff. As I said I don't know if it causes any ugly sideeffects but it solves my problem. By: Mark Spencer (markster) 2004-11-24 11:44:46.000-0600 The whole point of this code in here is to have the SRV lookups be reentrant and fast. Your code change would defeat that by using a single one, not protected by a mutex, for all lookups. By: Mark Spencer (markster) 2004-11-24 12:18:31.000-0600 Did turning off the SRV lookup make the problem go away? By: gkempke (gkempke) 2004-11-25 02:26:29.000-0600 Disabling srvlookups does indeed fix the problem. Nonetheless the bug should be fixed, I think. If reentrance is an issue here then maybe a sanity check in ast_sched_runq() would be a better solution (like breaking out of the loop after the loop has run for more than 1 second, for example)? Gunnar By: Mark Spencer (markster) 2004-11-25 13:17:03.000-0600 As initially suspected, this is a configuration issue, not a bug. There's no reasonable way to work around that kind of problem. |