Summary: | ASTERISK-01217: CVS head and 1.0 block every 30 minutes ( since >=22feb) | ||
Reporter: | zoa (zoa) | Labels: | |
Date Opened: | 2004-03-16 03:21:02.000-0600 | Date Closed: | 2004-09-25 02:53:51 |
Priority: | Blocker | Regression? | No |
Status: | Closed/Complete | Components: | Core/General |
Versions: | Frequency of Occurrence | ||
Related Issues: | |||
Environment: | Attachments: | ||
Description: | When i update cvs, one of my asterisk servers blocks on average once every 30 minutes. I cannot attach gdb, doesnt give me any output, just hangs. Sometimes, i can do show uptime, but stop now or restart now says No such command 'STOP NOW' (type 'help' for help). (same with other commands, some work, some don't). I cannot stop asterisk, not even with killall -9, the box requires a soft reboot. So far i was able to narrow it down to a date between 19 feb and 22 feb. All CVS versions since 22 feb crash, the ones before 19 and 19 februari itself look stable.) I'm trying to narrow it down even more, but this takes a while as i need to try this during coffee breaks :) This server is on 2.4.18 (exploitable, i know) i test the other cvs versions with updated asterisk, libpri and zaptel. I only use iax2, sip and a TE410p on the server, even when the server is idling (only poke messages / registration messages the deadlock seems to happen after a while). ****** ADDITIONAL INFORMATION ****** Do you guys see something that might cause this problem in the changelogs ? 2004-02-22 23:15 citats * apps/: app_queue.c (v1-0_stable.1), app_queue.c (1.47): Fixed gramatical error in app_queue.c (bug ASTERISK-1077) 2004-02-22 22:43 citats * asterisk.c (v1-0_stable.1), asterisk.c (1.52): Fix restarting when not called from the main console (bug ASTERISK-824 and ASTERISK-858) 2004-02-22 21:47 citats * pbx.c (v1-0_stable.2), pbx.c (1.99): Fix ast_add_extension2 updating ast_exten correctly in certain cases where extensions.conf is not ordered numerically by priority (bug ASTERISK-1059) 2004-02-22 17:09 markster * channels/chan_zap.c (v1-0_stable.5): Small but important pri_fixup addition (bug ASTERISK-50, thangs steveu!) 2004-02-22 15:41 markster * channels/chan_zap.c (1.177): Small but important fix for channel relocation (bug ASTERISK-50) 2004-02-22 00:31 citats * apps/: app_dial.c (v1-0_stable.2), app_dial.c (1.55): Fix another typo in the app_dial description 2004-02-22 00:25 citats * pbx.c (1.98), doc/README.variables (1.16): Add ${LANGUAGE} channel variable (bug ASTERISK-1072) 2004-02-20 16:24 markster * channels/chan_zap.c (v1-0_stable.4): Be sure to lock both slave and master while performing unlinkage 2004-02-20 16:23 markster * channels/chan_zap.c (1.176): Properly lock slave and master in zt_unlink (bug ASTERISK-1002) 2004-02-20 15:01 markster * channel.c (v1-0_stable.3): Minor reordering for bug ASTERISK-975 2004-02-20 15:00 markster * channel.c (1.76): Fix minor ordering issue (bug ASTERISK-975) 2004-02-20 12:44 markster * say.c (v1-0_stable.1): Fix for Norwegian support 2004-02-20 12:43 markster * say.c (1.16): Add support for Norwegian numbers (bug ASTERISK-1061) 2004-02-20 10:40 markster * channels/chan_sip.c, contrib/scripts/sip-friends.sql (v1-0_stable.[7,1]): Fix some SIP friends issues (bug ASTERISK-1057 & ASTERISK-1046) 2004-02-20 10:39 markster * channels/chan_sip.c (1.299), contrib/scripts/sip-friends.sql (1.2): Improve SIP friends support (should address bugs ASTERISK-1057 & ASTERISK-1046) 2004-02-19 18:17 markster * logger.c (1.28): Initialize queue logger 2004-02-19 15:08 markster * channel.c (v1-0_stable.2): Only unlock clone lock *after* both fixups are complete 2004-02-19 15:07 markster * channel.c (1.75): Don't free clone lock until after *both* fixups have taken place | ||
Comments: | By: Brian West (bkw918) 2004-03-16 11:56:55.000-0600 works fine from here. The question is how are you calling gdb? and can you make it drop a core file so you can check it it? By: zoa (zoa) 2004-03-17 04:02:05.000-0600 hmmz.. I was able to narrow it down to a patch made to channel.c on 19 or 20 feb. Index: asterisk/channel.c diff -c asterisk/channel.c:1.74 asterisk/channel.c:1.75 *** asterisk/channel.c:1.74 Wed Feb 4 16:18:16 2004 --- asterisk/channel.c Thu Feb 19 13:07:01 2004 *************** *** 2089,2107 **** /* Context, extension, priority, app data, jump table, remain the same */ /* pvt switches. pbx stays the same, as does next */ - /* Now, at this point, the "clone" channel is totally F'd up. We mark it as - a zombie so nothing tries to touch it. If it's already been marked as a - zombie, then free it now (since it already is considered invalid). */ - if (clone->zombie) { - ast_log(LOG_DEBUG, "Destroying clone '%s'\n", clone->name); - ast_mutex_unlock(&clone->lock); - ast_channel_free(clone); - manager_event(EVENT_FLAG_CALL, "Hangup", "Channel: %s\r\n", zombn); - } else { - ast_log(LOG_DEBUG, "Released clone lock on '%s'\n", clone->name); - clone->zombie=1; - ast_mutex_unlock(&clone->lock); - } /* Set the write format */ ast_set_write_format(original, wformat); --- 2089,2094 ---- *************** *** 2122,2127 **** --- 2109,2129 ---- } else ast_log(LOG_WARNING, "Driver '%s' does not have a fixup routine (for %s)! Bad things may happen.\n", original->type, original->name); + + /* Now, at this point, the "clone" channel is totally F'd up. We mark it as + a zombie so nothing tries to touch it. If it's already been marked as a + zombie, then free it now (since it already is considered invalid). */ + if (clone->zombie) { + ast_log(LOG_DEBUG, "Destroying clone '%s'\n", clone->name); + ast_mutex_unlock(&clone->lock); + ast_channel_free(clone); + manager_event(EVENT_FLAG_CALL, "Hangup", "Channel: %s\r\n", zombn); + } else { + ast_log(LOG_DEBUG, "Released clone lock on '%s'\n", clone->name); + clone->zombie=1; + ast_mutex_unlock(&clone->lock); + } + /* Signal any blocker */ if (original->blocking) pthread_kill(original->blocker, SIGURG); edited on: 03-17-04 03:34 By: Tilghman Lesher (tilghman) 2004-03-17 10:43:59.000-0600 If you cannot kill a process with a kill -9, that points to a hardware failure of some kind (or a kernel bug). By: zoa (zoa) 2004-03-17 10:56:47.000-0600 kram says it could be because part of asterisk is running in kernel mode (with the kernel modules etc). By: zoa (zoa) 2004-03-17 12:09:19.000-0600 hmmz, the version from 19 februari also hangs, but not after 30 minutes, but after a day or so... By: Mark Spencer (markster) 2004-03-18 01:01:59.000-0600 Can you still not attach to it? By: zoa (zoa) 2004-03-18 03:20:59.000-0600 don't think i can, will double check when it hangs again... (probably this afternoon) By: zoa (zoa) 2004-03-19 04:52:26.000-0600 it still doesnt hang... By: zoa (zoa) 2004-03-19 11:03:31.000-0600 whatever this is or was, it doesnt seem to happen with the latest cvs. /me very very happy. By: zoa (zoa) 2004-03-19 11:03:36.000-0600 whatever this is or was, it doesnt seem to happen with the latest cvs. /me very very happy. |