[Home]

Summary:ASTERISK-01217: CVS head and 1.0 block every 30 minutes ( since >=22feb)
Reporter:zoa (zoa)Labels:
Date Opened:2004-03-16 03:21:02.000-0600Date Closed:2004-09-25 02:53:51
Priority:BlockerRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:
Description:When i update cvs, one of my asterisk servers blocks on average once every 30 minutes.

I cannot attach gdb, doesnt give me any output, just hangs.

Sometimes, i can do show uptime, but stop now or restart now says No such command 'STOP NOW' (type 'help' for help).
(same with other commands, some work, some don't).


I cannot stop asterisk, not even with killall -9, the box requires a soft reboot.

So far i was able to narrow it down to a date between 19 feb and 22 feb.

All CVS versions since 22 feb crash, the ones before 19 and 19 februari itself look stable.)

I'm trying to narrow it down even more, but this takes a while as i need to try this during coffee breaks :)

This server is on 2.4.18 (exploitable, i know)
i test the other cvs versions with updated asterisk, libpri and zaptel.

I only use iax2, sip and a TE410p on the server, even when the server is idling (only poke messages / registration messages the deadlock seems to happen after a while).


****** ADDITIONAL INFORMATION ******

Do you guys see something that might cause this problem in the changelogs ?


2004-02-22 23:15  citats

* apps/: app_queue.c (v1-0_stable.1), app_queue.c (1.47): Fixed
gramatical error in app_queue.c (bug ASTERISK-1077)

2004-02-22 22:43  citats

* asterisk.c (v1-0_stable.1), asterisk.c (1.52): Fix restarting
when not called from the main console (bug ASTERISK-824 and ASTERISK-858)

2004-02-22 21:47  citats

* pbx.c (v1-0_stable.2), pbx.c (1.99): Fix ast_add_extension2
updating ast_exten correctly in certain cases where extensions.conf
is not ordered numerically by priority (bug ASTERISK-1059)

2004-02-22 17:09  markster

* channels/chan_zap.c (v1-0_stable.5): Small but important
pri_fixup addition (bug ASTERISK-50, thangs steveu!)

2004-02-22 15:41  markster

* channels/chan_zap.c (1.177): Small but important fix for channel
relocation (bug ASTERISK-50)

2004-02-22 00:31  citats

* apps/: app_dial.c (v1-0_stable.2), app_dial.c (1.55): Fix another
typo in the app_dial description

2004-02-22 00:25  citats

* pbx.c (1.98), doc/README.variables (1.16): Add ${LANGUAGE}
channel variable (bug ASTERISK-1072)

2004-02-20 16:24  markster

* channels/chan_zap.c (v1-0_stable.4): Be sure to lock both slave
and master while performing unlinkage

2004-02-20 16:23  markster

* channels/chan_zap.c (1.176): Properly lock slave and master in
zt_unlink (bug ASTERISK-1002)

2004-02-20 15:01  markster

* channel.c (v1-0_stable.3): Minor reordering for bug ASTERISK-975

2004-02-20 15:00  markster

* channel.c (1.76): Fix minor ordering issue (bug ASTERISK-975)

2004-02-20 12:44  markster

* say.c (v1-0_stable.1): Fix for Norwegian support

2004-02-20 12:43  markster

* say.c (1.16): Add support for Norwegian numbers (bug ASTERISK-1061)

2004-02-20 10:40  markster

* channels/chan_sip.c, contrib/scripts/sip-friends.sql
(v1-0_stable.[7,1]): Fix some SIP friends issues (bug ASTERISK-1057 &
ASTERISK-1046)

2004-02-20 10:39  markster

* channels/chan_sip.c (1.299), contrib/scripts/sip-friends.sql
(1.2): Improve SIP friends support (should address bugs ASTERISK-1057 &
ASTERISK-1046)

2004-02-19 18:17  markster

* logger.c (1.28): Initialize queue logger

2004-02-19 15:08  markster

* channel.c (v1-0_stable.2): Only unlock clone lock *after* both
fixups are complete

2004-02-19 15:07  markster

* channel.c (1.75): Don't free clone lock until after *both* fixups
have taken place

Comments:By: Brian West (bkw918) 2004-03-16 11:56:55.000-0600

works fine from here.  The question is how are you calling gdb?  and can you make it drop a core file so you can check it it?

By: zoa (zoa) 2004-03-17 04:02:05.000-0600

hmmz..

I was able to narrow it down to a patch made to channel.c on 19 or 20 feb.

Index: asterisk/channel.c
diff -c asterisk/channel.c:1.74 asterisk/channel.c:1.75
*** asterisk/channel.c:1.74     Wed Feb  4 16:18:16 2004
--- asterisk/channel.c  Thu Feb 19 13:07:01 2004
***************
*** 2089,2107 ****
       /* Context, extension, priority, app data, jump table,  remain the same */
       /* pvt switches.  pbx stays the same, as does next */

-       /* Now, at this point, the "clone" channel is totally F'd up.  We mark it as
-          a zombie so nothing tries to touch it.  If it's already been marked as a
-          zombie, then free it now (since it already is considered invalid). */
-       if (clone->zombie) {
-               ast_log(LOG_DEBUG, "Destroying clone '%s'\n", clone->name);
-               ast_mutex_unlock(&clone->lock);
-               ast_channel_free(clone);
-               manager_event(EVENT_FLAG_CALL, "Hangup", "Channel: %s\r\n", zombn);
-       } else {
-               ast_log(LOG_DEBUG, "Released clone lock on '%s'\n", clone->name);
-               clone->zombie=1;
-               ast_mutex_unlock(&clone->lock);
-       }
       /* Set the write format */
       ast_set_write_format(original, wformat);

--- 2089,2094 ----
***************
*** 2122,2127 ****
--- 2109,2129 ----
       } else
               ast_log(LOG_WARNING, "Driver '%s' does not have a fixup routine (for %s)!  Bad things may happen.\n",
                       original->type, original->name);
+
+       /* Now, at this point, the "clone" channel is totally F'd up.  We mark it as
+          a zombie so nothing tries to touch it.  If it's already been marked as a
+          zombie, then free it now (since it already is considered invalid). */
+       if (clone->zombie) {
+               ast_log(LOG_DEBUG, "Destroying clone '%s'\n", clone->name);
+               ast_mutex_unlock(&clone->lock);
+               ast_channel_free(clone);
+               manager_event(EVENT_FLAG_CALL, "Hangup", "Channel: %s\r\n", zombn);
+       } else {
+               ast_log(LOG_DEBUG, "Released clone lock on '%s'\n", clone->name);
+               clone->zombie=1;
+               ast_mutex_unlock(&clone->lock);
+       }
+
       /* Signal any blocker */
       if (original->blocking)
               pthread_kill(original->blocker, SIGURG);

edited on: 03-17-04 03:34

By: Tilghman Lesher (tilghman) 2004-03-17 10:43:59.000-0600

If you cannot kill a process with a kill -9, that points to a hardware failure of some kind (or a kernel bug).

By: zoa (zoa) 2004-03-17 10:56:47.000-0600

kram says it could be because part of asterisk is running in kernel mode (with the kernel modules etc).

By: zoa (zoa) 2004-03-17 12:09:19.000-0600

hmmz, the version from 19 februari also hangs, but not after 30 minutes, but after a day or so...

By: Mark Spencer (markster) 2004-03-18 01:01:59.000-0600

Can you still not attach to it?

By: zoa (zoa) 2004-03-18 03:20:59.000-0600

don't think i can, will double check when it hangs again... (probably this afternoon)

By: zoa (zoa) 2004-03-19 04:52:26.000-0600

it still doesnt hang...

By: zoa (zoa) 2004-03-19 11:03:31.000-0600

whatever this is or was, it doesnt seem to happen with the latest cvs.

/me very very happy.

By: zoa (zoa) 2004-03-19 11:03:36.000-0600

whatever this is or was, it doesnt seem to happen with the latest cvs.

/me very very happy.