[Home]

Summary:ASTERISK-11077: Asterisk dies on chan_zap reload
Reporter:yema (yem)Labels:
Date Opened:2007-12-18 12:10:44.000-0600Date Closed:2008-06-03 19:27:34
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Channels/chan_zap
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 11594-tm.diff
( 1) gdb-200712181446.txt
( 2) stackfix2.patch
( 3) stackfix3.patch
( 4) stackfix4.patch
( 5) stackfix5.patch
( 6) stack-increase.patch
Description:Asterisk dies when I do: module reload chan_zap.so.
Here is the output I get on the console:

local*CLI> module reload chan_zap.so
local*CLI> [Dec 18 12:59:20] WARNING[1024]: chan_zap.c:11129 process_zap: Ignoring switchtype
[Dec 18 12:59:20] WARNING[1024]: chan_zap.c:11129 process_zap: Ignoring resetinterval
[Dec 18 12:59:20] WARNING[1024]: chan_zap.c:11129 process_zap: Ignoring overlapdial
[Dec 18 12:59:20] WARNING[1024]: chan_zap.c:11129 process_zap: Ignoring priindication
[Dec 18 12:59:20] WARNING[1024]: chan_zap.c:11129 process_zap: Ignoring facilityenable
[Dec 18 12:59:20] WARNING[1024]: chan_zap.c:11129 process_zap: Ignoring signalling
/usr/sbin/safe_asterisk: line 117:   954 Segmentation fault      nice -n $PRIORITY ${ASTSBINDIR}/asterisk -f ${CLIARGS} ${ASTARGS}
Asterisk ended with exit status 139
Asterisk exited on signal 11.

Disconnected from Asterisk server
local:~# Automatically restarting Asterisk.


****** ADDITIONAL INFORMATION ******

zaptel-1.4.7.1 (tried with zaptel-1.4.7 as well)
libpri-1.4.3
asterisk-1.4.15
asterisk-addons-1.4.5
wanpipe-3.2.1 (A102D card)

The PRI is up and running, no problems whatsoever before I try and reload chan_zap:
local*CLI> zap show status
Description                              Alarms     IRQ        bpviol     CRC4
wanpipe1 card 0                          OK         0          0          0
local*CLI>

local*CLI> pri show span 1
Primary D-channel: 24
Status: Provisioned, Up, Active
Switchtype: National ISDN
Type: CPE
Window Length: 0/7
Sentrej: 0
SolicitFbit: 0
Retrans: 0
Busy: 0
Overlap Dial: 0
T200 Timer: 1000
T203 Timer: 10000
T305 Timer: 30000
T308 Timer: 4000
T309 Timer: -1
T313 Timer: 4000
N200 Counter: 3

Doesn't happen with any other channels.
Comments:By: Jason Parker (jparker) 2007-12-18 12:11:36.000-0600

Please upload a backtrace, per the bug guidelines.

By: yema (yem) 2007-12-18 12:37:14.000-0600

I never get a core when this happens

By: Jason Parker (jparker) 2007-12-18 12:40:18.000-0600

The bug guidelines points to a document (doc/backtrace.txt) which explains how to force asterisk to make a core.

By: yema (yem) 2007-12-18 13:47:01.000-0600

attachement contains, "bt", "bt full" and "thread apply all bt"

By: Jason Parker (jparker) 2007-12-18 14:09:07.000-0600

Can you provide a backtrace from an unoptimized build?  You'll need to rebuild after running menuselect and enabling the DONT_OPTIMIZE option.

By: yema (yem) 2007-12-18 14:34:32.000-0600

Well, oddly enough, after recompiling with DONT_OPTIMIZE, it doesn't crash:

local*CLI> core set verbose 10
Verbosity was 0 and is now 10
local*CLI> core set debug 10
Core debug was 0 and is now 10
local*CLI> module reload cha
chan_agent.so  chan_iax2.so   chan_sip.so    chan_zap.so
local*CLI> module reload chan_zap.so
local*CLI> [Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring switchtype
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring overlapdial
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring priindication
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring signalling
   -- Reloading module 'chan_zap.so' (Zapata Telephony)
 == Parsing '/etc/asterisk/zapata.conf': Found
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring switchtype
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring overlapdial
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring priindication
[Dec 18 15:33:27] WARNING[17635]: chan_zap.c:11129 process_zap: Ignoring signalling
   -- Reconfigured channel 1, ISDN PRI signalling
   -- Reconfigured channel 2, ISDN PRI signalling
   -- Reconfigured channel 3, ISDN PRI signalling
   -- Reconfigured channel 4, ISDN PRI signalling
   -- Reconfigured channel 5, ISDN PRI signalling
   -- Reconfigured channel 6, ISDN PRI signalling
   -- Reconfigured channel 7, ISDN PRI signalling
   -- Reconfigured channel 8, ISDN PRI signalling
   -- Reconfigured channel 9, ISDN PRI signalling
   -- Reconfigured channel 10, ISDN PRI signalling
   -- Reconfigured channel 11, ISDN PRI signalling
   -- Reconfigured channel 12, ISDN PRI signalling
   -- Reconfigured channel 13, ISDN PRI signalling
   -- Reconfigured channel 14, ISDN PRI signalling
   -- Reconfigured channel 15, ISDN PRI signalling
   -- Reconfigured channel 16, ISDN PRI signalling
   -- Reconfigured channel 17, ISDN PRI signalling
   -- Reconfigured channel 18, ISDN PRI signalling
   -- Reconfigured channel 19, ISDN PRI signalling
   -- Reconfigured channel 20, ISDN PRI signalling
   -- Reconfigured channel 21, ISDN PRI signalling
   -- Reconfigured channel 22, ISDN PRI signalling
   -- Reconfigured channel 23, ISDN PRI signalling
 == Parsing '/etc/asterisk/users.conf': Found
local*CLI>

By: yema (yem) 2007-12-19 12:14:30.000-0600

Reproduced with 1.4.16 as well.
This is consistent. Compiled with DONT_OPTIMIZE , no crash on zap reload.

By: Jason Parker (jparker) 2007-12-19 12:19:59.000-0600

Try this small patch.

It looks like the timezone stuff might be causing this.  There was something similar fixed for chan_unistim recently.

By: yema (yem) 2007-12-19 13:05:36.000-0600

Just recompiled 1.4.16 with the patch.
Same result:
if DONT_OPTIMIZE is NOT set, "module reload chan_zap.so" crashes.
if DONT_OPTIMIZE IS set, chan_zap.so reloads normally.

By: yema (yem) 2007-12-28 13:38:36.000-0600

Can someone confirm this, or this is my particular build ?

By: Gregory Hinton Nietsky (irroot) 2007-12-31 10:12:05.000-0600

hi there please send a unoptomised GDB BT and your zapata.conf perhaps ...
and someone will be able to at least see what is going on ...

By: yema (yem) 2007-12-31 10:25:20.000-0600

I already attached my "bt", "bt full" and "thread apply all bt" (gdb-200712181446.txt)

my zapata.conf:
[trunkgroups]

[channels]
language=en
context=pri
switchtype=national
overlapdial=yes
priindication=outofband
signalling=pri_cpe
usecallerid=yes
cidsignalling=bell
cidstart=ring
hidecallerid=no
callwaiting=yes
restrictcid=no
usecallingpres=yes
callwaitingcallerid=yes
threewaycalling=yes
transfer=yes
canpark=yes
cancallforward=yes
callreturn=yes
echocancel=yes
echocancelwhenbridged=no
relaxdtmf=yes
rxgain=0.0
txgain=0.0
group=1
callgroup=1
pickupgroup=1
immediate=no
callerid=asreceived
faxdetect=both
mohinterpret=default
mohsuggest=default
channel => 1-23

By: Gregory Hinton Nietsky (irroot) 2007-12-31 10:39:13.000-0600

i hear you but it needs to be unoptomised go to the source directory "make menuconfig"  under compiler flags make sure DONT_OPTOMISE is on then rebuild run crash and get the bt ... the optomised version is pretty useless ...

By: yema (yem) 2007-12-31 11:00:42.000-0600

the problem is if I use DONT_OPTIMIZE, it doesn't crash !

By: Gregory Hinton Nietsky (irroot) 2008-01-01 06:53:20.000-0600

mmm ok ok missed the earlier post what version of compiler you using if this is the case id suguest it is a buggy compiler ive had problems with early GCC-4 upgraded and problems went away ...

By: yema (yem) 2008-01-03 08:17:43.000-0600

Yes, my current gcc is the default one from Debian Etch (gcc 4.1.2).
I'll see if I can downgrade to gcc 3 and rebuild.

By: yema (yem) 2008-01-04 11:08:21.000-0600

Well the compiler doesn't seem to make a difference. Although what I noticed is that when LOW_MEMORY flag is used, then switching DONT_OPTIMIZE makes a difference (crash and no crash). If LOW_MEMORY flag is not used then no crash occurs (tested on 1.4.17).

By: Jason Parker (jparker) 2008-01-18 15:30:44.000-0600

Does the backtrace change at all with the above patch applied?

By: jmls (jmls) 2008-02-17 12:49:15.000-0600

yem, please see Qwell's comment. Did this work for you ?

By: yema (yem) 2008-02-27 10:55:39.000-0600

No, no change.
I guess it has to do with compiler optimizations.
if LOW_MEMORY is not used, no crash occurs, with or without optimizations.
if LOW_MEMORY is used, DONT_OPTIMIZE makes it NOT crash, but if DONT_OPTIMIZE is NOT set, the crash occurs.
Has anyone been able to reproduce ?

By: Jason Parker (jparker) 2008-02-27 14:45:39.000-0600

I'd really still like to see a new backtrace here, with the above patch applied.

By: Mark Michelson (mmichelson) 2008-04-10 15:32:27

The patch here won't (okay, shouldn't) make a difference since ast_localtime memsets the tm struct to all 0's prior to acting on it. The issue lies elsewhere, assuming that the issue still happens. I will attempt to reproduce it myself and track it down.

By: Mark Michelson (mmichelson) 2008-04-11 09:12:33

I attempted this using many combinations of compiler flags (I even found that the combination of LOW_MEMORY and DEBUG_THREADS would not build properly) and could not reproduce this crash. I tried using the same zapata.conf used here, and it made no difference. One difference between the two setups is that I am not using a PRI card but am instead using an analog card.

By: Jeff Peeler (jpeeler) 2008-05-14 15:33:19

I've tried duplicating the problem using a TE420 without success. Yem, are you still having problems?

By: yema (yem) 2008-05-14 16:04:39

Yes, I haven't "played" with it in a little while now, but I still had the problem with 1.4.17.
Note, that the real difference seems to be only when I enable the LOW_MEMORY option.
When LOW_MEMORY is enabled, then switching DONT_OPTIMIZE to disabled (optimize) or enabled (not optimize) makes the reload crash or not crash , respectively.
I tested with a Sangoma A101 T1 card, but I could probably try with a Digium TE110P that I have around if you think it'd make a difference. Thx.

By: Jeff Peeler (jpeeler) 2008-05-14 16:48:41

No, I don't think changing the card will make a difference. But perhaps upgrading will?

By: yema (yem) 2008-05-14 18:49:45

You mean upgrading to the latest zaptel/libpri/asterisk/ .... ?
I can certainly do that and see what happens.

By: Jeff Peeler (jpeeler) 2008-05-14 20:25:13

Yes, that is what I mean. If it crashes again, please outline the exact method you used to reproduce the problem. Does it crash consistently? Can you reload chan_zap as soon as Asterisk is loaded and have the crash occur? Do you need to have anything connected to the card?

By: yema (yem) 2008-05-15 15:33:50

Ok, I installed:
zaptel-1.4.10.1
libpri-1.4.4
asterisk-1.4.19.2
asterisk-addons-1.4.6
wanpipe-3.3.9

I compiled with LOW_MEMORY and can reproduce the problem consistently. I don't need anything attached on the PRI card. As soon as asterisk is started I can get it to crash by doing a "reload chan_zap.so".
If I add the compile option DONT_OPTIMIZE, then it doesn't crash on reload. If I don't compile with LOW_MEMORY, then it's fine as well.
Here is the output:
voip*CLI> core show version
Asterisk 1.4.19.2 built by root @ voip on a i686 running Linux on 2008-05-15 20:13:18 UTC
voip*CLI> core show uptime
System uptime: 38 seconds
voip*CLI> module reload chan_zap.so
[May 15 16:45:52] WARNING[12257]: chan_zap.c:11231 process_zap: Ignoring switchtype
[May 15 16:45:52] WARNING[12257]: chan_zap.c:11231 process_zap: Ignoring overlapdial
[May 15 16:45:52] WARNING[12257]: chan_zap.c:11231 process_zap: Ignoring priindication
[May 15 16:45:52] WARNING[12257]: chan_zap.c:11231 process_zap: Ignoring signalling
voip*CLI>
Disconnected from Asterisk server
voip:/usr/local/src/wanpipe# /usr/sbin/safe_asterisk: line 117: 12235 Segmentation fault      (core dumped) nice -n $PRIORITY ${ASTSBINDIR}/asterisk -f ${CLIARGS} ${ASTARGS}
Asterisk ended with exit status 139
Asterisk exited on signal 11.
Automatically restarting Asterisk.

By: Jeff Peeler (jpeeler) 2008-05-19 19:03:24

I'm still investigating this, but try this stack-increase.patch. It fixed the problem for me. I'm confused though why the problem isn't present in trunk as well.

By: yema (yem) 2008-05-21 13:36:12

Ok, just tried with the stack patch ... and it works now ... no crash.
Thanks !

By: Jeff Peeler (jpeeler) 2008-05-21 13:53:29

I'd like you to test my final patch as increasing the stack size isn't an acceptable solution. Should be ready soon.

By: yema (yem) 2008-05-21 15:42:39

Ok, no problem.
Does that mean you could consistently reproduce the problem ?

By: Jeff Peeler (jpeeler) 2008-05-21 18:03:30

Yes, I could consistently reproduce the problem. Revert the previous patch and try the new stackfix2.patch. Thanks for testing.

By: yema (yem) 2008-05-22 08:57:59

I reverted the previous patch and applied the new patch, but I could crash it consistently by reloading chan_zap.so again.

By: Jeff Peeler (jpeeler) 2008-05-22 11:39:38

This makes things tough since the problem is fixed here. I've added three more patches for you to try (sorry, this type of a bug is trial and error by nature). If none of these patches work, I'd like to request SSH access if that is possible. If you're interested in going that route, please send me an encrypted email to my user name @digium.com with the login credentials.

By: yema (yem) 2008-05-22 13:09:31

Ok, I'll try and do all the tests today.
Note that the system doesn't require LOW_MEMORY ... it has plenty of RAM (1GB).

By: yema (yem) 2008-05-22 15:55:28

Ok, same result with all 4 patches.
Expect a direct email from me with a similar subject.
Thanks.

By: Digium Subversion (svnbot) 2008-06-03 17:09:01

Repository: asterisk
Revision: 120173

U   branches/1.4/main/config.c

------------------------------------------------------------------------
r120173 | jpeeler | 2008-06-03 17:09:00 -0500 (Tue, 03 Jun 2008) | 6 lines

(closes issue ASTERISK-11077)
Reported by: yem
Tested by: yem

This change decreases the buffer size allocated on the stack substantially in config_text_file_load when LOW_MEMORY is turned on. This change combined with the fix from revision 117462 (making mkintf not copy the zt_chan_conf structure) was enough to prevent the crash.

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120173

By: Digium Subversion (svnbot) 2008-06-03 17:10:29

Repository: asterisk
Revision: 120174

_U  trunk/
U   trunk/main/config.c

------------------------------------------------------------------------
r120174 | jpeeler | 2008-06-03 17:10:29 -0500 (Tue, 03 Jun 2008) | 14 lines

Merged revisions 120173 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120173 | jpeeler | 2008-06-03 17:15:33 -0500 (Tue, 03 Jun 2008) | 6 lines

(closes issue ASTERISK-11077)
Reported by: yem
Tested by: yem

This change decreases the buffer size allocated on the stack substantially in config_text_file_load when LOW_MEMORY is turned on. This change combined with the fix from revision 117462 (making mkintf not copy the zt_chan_conf structure) was enough to prevent the crash.

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120174

By: Digium Subversion (svnbot) 2008-06-03 17:11:45

Repository: asterisk
Revision: 120178

U   branches/1.6.0/main/config.c

------------------------------------------------------------------------
r120178 | jpeeler | 2008-06-03 17:11:44 -0500 (Tue, 03 Jun 2008) | 22 lines

Merged revisions 120174 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r120174 | jpeeler | 2008-06-03 17:17:07 -0500 (Tue, 03 Jun 2008) | 14 lines

Merged revisions 120173 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120173 | jpeeler | 2008-06-03 17:15:33 -0500 (Tue, 03 Jun 2008) | 6 lines

(closes issue ASTERISK-11077)
Reported by: yem
Tested by: yem

This change decreases the buffer size allocated on the stack substantially in config_text_file_load when LOW_MEMORY is turned on. This change combined with the fix from revision 117462 (making mkintf not copy the zt_chan_conf structure) was enough to prevent the crash.

........

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120178

By: Digium Subversion (svnbot) 2008-06-03 19:27:34

Repository: asterisk
Revision: 120279

_U  team/seanbright/resolve-shadow-warnings/
U   team/seanbright/resolve-shadow-warnings/CHANGES
U   team/seanbright/resolve-shadow-warnings/Makefile
U   team/seanbright/resolve-shadow-warnings/apps/app_queue.c
U   team/seanbright/resolve-shadow-warnings/channels/chan_iax2.c
D   team/seanbright/resolve-shadow-warnings/configs/pbx_realtime.conf
U   team/seanbright/resolve-shadow-warnings/funcs/func_channel.c
U   team/seanbright/resolve-shadow-warnings/include/asterisk/options.h
U   team/seanbright/resolve-shadow-warnings/main/asterisk.c
U   team/seanbright/resolve-shadow-warnings/main/config.c
U   team/seanbright/resolve-shadow-warnings/main/pbx.c
U   team/seanbright/resolve-shadow-warnings/pbx/pbx_loopback.c
U   team/seanbright/resolve-shadow-warnings/pbx/pbx_realtime.c
U   team/seanbright/resolve-shadow-warnings/res/res_agi.c

------------------------------------------------------------------------
r120279 | seanbright | 2008-06-03 19:27:31 -0500 (Tue, 03 Jun 2008) | 90 lines

Merged revisions 120166,120169,120171,120174,120227,120230 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

................
r120166 | mmichelson | 2008-06-03 17:22:52 -0400 (Tue, 03 Jun 2008) | 13 lines

Adding two new queue log events. The ADDMEMBER event is logged when
a dynamic realtime queue member is added to the queue, and the
REMOVEMEMBER event is logged when a dynamic realtime member is
removed. Since no calling channel is associated with these events
the string "REALTIME" is placed where the channel's unique id is
normally placed.

(closes issue ASTERISK-12128)
Reported by: atis
Patches:
     queue_log_rt_members.patch uploaded by atis (license 242)


................
r120169 | russell | 2008-06-03 17:35:11 -0400 (Tue, 03 Jun 2008) | 12 lines

Merged revisions 120168 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120168 | russell | 2008-06-03 16:34:55 -0500 (Tue, 03 Jun 2008) | 4 lines

Fix another place where peer->callno could change at a very bad time, and also
fix a place where a peer was used after the reference was released.
(inspired by rev 120001)

........

................
r120171 | tilghman | 2008-06-03 18:05:16 -0400 (Tue, 03 Jun 2008) | 5 lines

Move compatibility options into asterisk.conf, default them to on for upgrades,
and off for new installations.  This includes the translation from pipes to commas
for pbx_realtime and the EXEC command for AGI, as well as the change to the Set
application not to support multiple variables at once.

................
r120174 | jpeeler | 2008-06-03 18:17:07 -0400 (Tue, 03 Jun 2008) | 14 lines

Merged revisions 120173 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120173 | jpeeler | 2008-06-03 17:15:33 -0500 (Tue, 03 Jun 2008) | 6 lines

(closes issue ASTERISK-11077)
Reported by: yem
Tested by: yem

This change decreases the buffer size allocated on the stack substantially in config_text_file_load when LOW_MEMORY is turned on. This change combined with the fix from revision 117462 (making mkintf not copy the zt_chan_conf structure) was enough to prevent the crash.

........

................
r120227 | tilghman | 2008-06-03 18:42:03 -0400 (Tue, 03 Jun 2008) | 16 lines

Merged revisions 120226 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r120226 | tilghman | 2008-06-03 17:41:04 -0500 (Tue, 03 Jun 2008) | 8 lines

Due to incorrect use of the AST_LIST_INSERT_HEAD() macro the loopback switch
cannot perform any translation on the extension number before searching for it
in the target context.
(closes issue ASTERISK-11875)
Reported by: chappell
Patches:
      pbx_loopback.c.diff uploaded by chappell (license 8)

........

................
r120230 | tilghman | 2008-06-03 19:17:33 -0400 (Tue, 03 Jun 2008) | 7 lines

Add a function, CHANNELS(), which retrieves a list of all active channels.
(closes issue ASTERISK-10844)
Reported by: rain
Patches:
      func_channel-channel_list_function.diff uploaded by rain (license 327)
      (with some additional changes by me, mostly to meet coding guidelines)

................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=120279