ASTERISK-10713: chan_zap causing reset on E1 and eventually crashed asterisk

[Home]

Summary: ASTERISK-10713: chan_zap causing reset on E1 and eventually crashed asterisk

Reporter: MartinB (freon1) Labels:

Date Opened: 2007-11-07 21:03:35.000-0600 Date Closed: 2008-08-04 15:30:32

Priority: Critical Regression? No

Status: Closed/Complete Components: Channels/chan_zap

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) backtrace.txt

Description: I have a TE410 and TC400B installed under Fedora 7 kernel 2.6.23.1-10. with asterisk-1.4.13, zaptel-1.4.6, libpri-1.4.2. I have two E1 installed on the TE410 card my calls are all going from IP to PSTN. When I have above 40-45 calls (if it is less than 40 I havent noticed any problems) after a while I start getting error similar to:
[Nov 6 17:32:18] ERROR[32610] chan_zap.c: !! Got reject for frame 30, retransmitting frame 30 now, updating n_r!
the number of errors increase and all of a sudden one of the E1s resets and all the calls drop on that E1. The problem repeats and at some point the other E1 resets which E1 resets is pretty random. Then after several of these errors the asterisk core dumps. I have tested the hardware with the help of digium support and have eliminated the hardware as an issue. I also tested the E1s and they are clean. I also to the best of my ability eliminated the problem being caused by conflict of any unrelated modules. I have the full debug (crash at 18:29:37) also the core dump both are tar.gz(ipped).. I am available to provide further testing with live calls if we need to get more data or test with live calls if there is a patch. Thanks

Comments: By: MartinB (freon1) 2007-11-08 05:18:28.000-0600

I tried to upload the core and log files but I am not able. It maybe due to their size. The Core dump and log file each are about 1.5M.
By: Jared Smith (jsmith) 2007-11-08 08:19:31.000-0600

I've seen this problem on T1s as well, on every version of Asterisk 1.4/Zaptel 1.4/libpri 1.4. I've provided traces to MattF at various times over the past 9 months, but still nobody has been able to track down the problem.

Here are the few little nuggets of information I've been able to find in my own work:

1) It seems to be related to outbound calls, and not inbound. (I can't prove this definitively, but my experience suggests it's a large number of outbound calls that cause this to happen.)

2) I too see the "got reject" messages before the crashes

3) I too have tested the hardware, and even had Digium replace the hardware. I too have had the T1 lines tested extensively by the telco. (In fact, I've been able to reproduce this on different T1s from different telcos, so it's definitely an Asterisk problem and not a telco problem.)

4) In my case, not only does it cause the T1 in question to reset, but it ends up resetting the entire PRI trunk group -- this could be something my telcos do, but it's very annoying

If you want to email me your logs to jsmith(at)digium(dot)com, I'd be happy to pass them along to MattF and see if he can find anything useful in them.
By: Matthew Fredrickson (mattf) 2007-11-10 15:44:36.000-0600

Can you try using the hardhdlc option in zaptel.conf for your dchan and seeing how your performance is?
By: Matthew Fredrickson (mattf) 2007-11-10 15:45:31.000-0600

I have seen symptoms like this, but typically it is correlated to the CPU being overloaded on the system.
By: Matthew Fredrickson (mattf) 2007-11-10 15:48:23.000-0600

Can you get in contact with me directly via AIM or IRC? You can find me on AIM as MatthewFredricks or on irc.freenode.net as Cresl1n.
By: Tilghman Lesher (tilghman) 2007-11-11 10:41:33.000-0600

Please upload the core backtrace, not the core file. You can get details on how to create the backtrace at doc/backtrace.txt.
By: MartinB (freon1) 2007-11-12 03:41:38.000-0600

I uploaded the backtrace. Couple of things: I loaded 1.2 with the same config and dont see any crash and have all the lines working. The "crash", and not being able to run more that 40 lines on zap may not be directly related as I see the E1 resetting a few times prior to the crash--- just a guess. I doubt the CPU is being overloaded as the problem doesn't happen on 1.2 and also I am using a TC400 card for voice compression on a dual processor dell server. Matt I will look for you on AIM.
By: Tilghman Lesher (tilghman) 2007-11-12 06:54:31.000-0600

Newp. This is a classic case of memory corruption, and the procedure right now is to follow the details in doc/valgrind.txt to get us the necessary debugging to diagnose and solve this issue.
By: Frederic LE FOLL (flefoll) 2007-12-21 02:38:10.000-0600

Did you try to activate T309 in zapata.conf ("pritimer => t309,6000" for instance) ? This maintains active calls when loss of layer 2 is shorter than T309. If you don't enable T309, you get chan_zap default behaviour, and it hangs up all calls.

I ask this, because I see that ast_hangup() is involved in your backtrace. This won't fix the Layer 2 reject problem, this won't fix the memory corruption, but :
- it may allow you to keep your active calls if the layer 2 problem is transitory,
- it may avoid falling into this memory corruption problem if it is related to the general calls hangup.
This is just a suggestion. Whether it will be efficient or not, depends on what problem comes first : memory corruption, layer 2 error, ... ?
By: Tilghman Lesher (tilghman) 2008-02-04 11:41:36.000-0600

We have fixed a major memory corruption issue, and the fix is in current SVN 1.4, which will become 1.4.18 later today. Please test this release, as I believe it will resolve the underlying memory corruption issue.
By: jmls (jmls) 2008-02-17 12:59:51.000-0600

freon1, did you test the latest release as suggested ? Or can we close this ?
By: jmls (jmls) 2008-02-17 12:59:51.000-0600

freon1, did you test the latest release as suggested ? Or can we close this ?
By: MartinB (freon1) 2008-02-18 22:06:44.000-0600

Yes it still crashes. I have identified what crashes and what doesn't to the best I could:

asterisk-1.2.24 Doesn't crash
asterisk-1.2.26.2 Crashes but very infrequently

asterisk-1.4.xx all the way to the current release of 1.4.18 crashes several times a day and never supports more than 40 channels. Also they reset E1's many times a day so calls get dropped.

I am sending live traffic doing too much testing is somewhat impossible, some of the debugging I had to do didn't let more than a couple of calls go thru so it never got to a point where I could get 40 to 50 calls in order to get to a point where it would crash. So I started using 1.2.24. I am willing to have the system open to you so someone with expertise can look and find out what is exactly happening. The best time to do this is in the mornings I have other routes that I can reroute my traffic and when needed send traffic to the box and have you look at what is happening. I am on MSN messenger under mbintampa@hotmail.com and yahoo under newnetbldr please feel free to message me if this works for you. Thanks.
By: Digium Subversion (svnbot) 2008-02-29 17:30:55.000-0600

Repository: asterisk
Revision: 105409

U branches/1.4/main/autoservice.c

------------------------------------------------------------------------
r105409 | russell | 2008-02-29 17:30:48 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice. There was a race condition in the handling of
the list of channels in autoservice. The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
was still messing with the channel. This led to memory corruption, and caused
crashes. This explains multiple backtraces I have seen that have references
to autoservice, but do to the nature of the issue (memory corruption), could
cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related. If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105409
By: Digium Subversion (svnbot) 2008-02-29 17:33:01.000-0600

Repository: asterisk
Revision: 105410

_U trunk/
U trunk/main/autoservice.c

------------------------------------------------------------------------
r105410 | russell | 2008-02-29 17:33:00 -0600 (Fri, 29 Feb 2008) | 31 lines

Merged revisions 105409 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice. There was a race condition in the handling of
the list of channels in autoservice. The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
was still messing with the channel. This led to memory corruption, and caused
crashes. This explains multiple backtraces I have seen that have references
to autoservice, but do to the nature of the issue (memory corruption), could
cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related. If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105410
By: Digium Subversion (svnbot) 2008-02-29 17:57:03.000-0600

Repository: asterisk
Revision: 105409

U branches/1.4/main/autoservice.c

------------------------------------------------------------------------
r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice. There was a race condition in the handling of
the list of channels in autoservice. The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
thread was still messing with the channel. This led to memory corruption, and
caused crashes. This explains multiple backtraces I have seen that have
references to autoservice, but do to the nature of the issue (memory corruption),
could cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related. If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105409
By: Digium Subversion (svnbot) 2008-02-29 17:57:35.000-0600

Repository: asterisk
Revision: 105410

_U trunk/
U trunk/main/autoservice.c

------------------------------------------------------------------------
r105410 | russell | 2008-02-29 17:36:46 -0600 (Fri, 29 Feb 2008) | 31 lines

Merged revisions 105409 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r105409 | russell | 2008-02-29 17:34:32 -0600 (Fri, 29 Feb 2008) | 23 lines

Fix a major bug in autoservice. There was a race condition in the handling of
the list of channels in autoservice. The problem was that it was possible for
a channel to get removed from autoservice and destroyed, while the autoservice
thread was still messing with the channel. This led to memory corruption, and
caused crashes. This explains multiple backtraces I have seen that have
references to autoservice, but do to the nature of the issue (memory corruption),
could cause crashes in a number of areas.

(fixes the crash in BE-386)
(closes issue ASTERISK-11165)
(closes issue ASTERISK-11391)

The following issues could be related. If you are the reporter of one of these,
please update to include this fix and try again.

(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11545)
(potentially fixes issue ASTERISK-11058)
(potentially fixes issue ASTERISK-11453)
(potentially fixes issue ASTERISK-10713)
(potentially fixes issue ASTERISK-11437)
(potentially fixes issue ASTERISK-11259)

........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=105410
By: Joshua C. Colp (jcolp) 2008-03-19 14:49:15

Per Russell's messages, how are things now?
By: MartinB (freon1) 2008-03-20 22:22:59

I still have problems. I get errors similar to the errors I listed below when there are more than 40 calls. And eventually system crashed ( with 1.4.19.rc3) I havent seen the crash in 3 hrs yet but I see the errors. By the way my problem seems very similar to

http://bugs.digium.com/view.php?id=11791

More than 1 E1 of zap calls causes problems and eventually core dump/crash. Could t be in the zap driver?

The E1's are fine as they seem to be working fine with asterisk 1.2.24

Thanks

[Mar 20 22:42:58] ERROR[2808]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 34, retransmitting frame 34 now, updating n_r!
[Mar 20 22:42:58] ERROR[2808]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 34, retransmitting frame 35 now, updating n_r!
[Mar 20 22:42:58] ERROR[2808]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 34, retransmitting frame 36 now, updating n_r!

[Mar 20 22:42:57] ERROR[2808]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 28, retransmitting frame 28 now, updating n_r!
[Mar 20 22:42:57] ERROR[2808]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 28, retransmitting frame 29 now, updating n_r!

[Mar 20 23:15:41] ERROR[2809]: chan_zap.c:8249 zt_pri_error: !! Not good - head of queue has not been transmitted yet
By: Matthew Fredrickson (mattf) 2008-03-21 09:26:51

I think it's possible that you may need to get an updated version of the TE410P. In any case, it's best to handle this type of problem through technical support. Please call them and notify them of my suggestion.

Also, did setting T309 make a difference for you? That will help your calls to stay up when the data link fluctuates.
By: MartinB (freon1) 2008-03-23 14:23:54

Matt Before this report, I started with the tech support and they tested the hardware and said it was a software issue. The second thing is when I run asterisk 1.2 version I don't get any of the chan_zap errors at all. So I don't know if it could be hardware related. I changed the t309 value to 10000 but then it seemed like it was holding up sockets and at some point in time it wouldn't take any calls and show all the calls hung, so I set it back to default.

With version 1.4.19.rc3 I didn't seen any crashes i.e core dump, but I see "quit a bit" of the chan_zap errors (which I dont see under 1.2.26.2) and eventually at some point I get kernel error in the syslog (I think this is due to the TC400B card/firmware) and no voice is heard then all channels get hung and no calls will go thru, restarting asterisk doesn't fix the problem but rebooting the server does. The core dump crashes may have gotten fixed due to the two memory leaks being fixed or the whole thing not lasting long enough to see a crash.

I am at a point where I am almost giving up on the whole Digium hardware in combination with Asterisk 1.4 and at some point I will test with 1.6 when it is released. Someone told me Sangoma hardware performs better and they don't have a problem. I mean I have no idea where I am with this whole thing, I am just happy I found a versions of 1.2 that is working pretty stable except for a few resets for now.

If you really want to figure out what is going on I can have it set up for you to login and debug while real calls are going as long as we can limit the time as the calls are live. Thnx

Some sample chan_zap errors:

q931.c:3751 q931_dl_indication: link is DOWN
q931.c:3757 q931_dl_indication: activate T309 for call 32802 on channel 21
q931.c:3757 q931_dl_indication: activate T309 for call 32817 on channel 20
q931.c:3757 q931_dl_indication: activate T309 for call 32830 on channel 22
q931.c:3757 q931_dl_indication: activate T309 for call 32833 on channel 24
q931.c:3757 q931_dl_indication: activate T309 for call 32844 on channel 26
q931.c:3757 q931_dl_indication: activate T309 for call 32852 on channel 29
q931.c:3757 q931_dl_indication: activate T309 for call 32855 on channel 30
== Primary D-Channel on span 4 down
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got I-frame while link state 2
q931.c:3772 q931_dl_indication: link is UP
q931.c:3776 q931_dl_indication: cancel T309 for call 32802 on channel 21
q931.c:3776 q931_dl_indication: cancel T309 for call 32817 on channel 20
q931.c:3776 q931_dl_indication: cancel T309 for call 32830 on channel 22
q931.c:3776 q931_dl_indication: cancel T309 for call 32833 on channel 24
q931.c:3776 q931_dl_indication: cancel T309 for call 32844 on channel 26
q931.c:3776 q931_dl_indication: cancel T309 for call 32852 on channel 29
q931.c:3776 q931_dl_indication: cancel T309 for call 32855 on channel 30
== Primary D-Channel on span 4 up
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 0 now, updating n_r!
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 1 now, updating n_r!
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 2 now, updating n_r!
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 3 now, updating n_r!
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 4 now, updating n_r!
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 5 now, updating n_r!
[Mar 22 16:12:39] ERROR[2782]: chan_zap.c:8249 zt_pri_error: !! Got reject for frame 0, retransmitting frame 6 now, updating n_r!

Kernel errors before the whole thing goes really bad:

Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: Oops: 0002 [#1]
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: SMP
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: CPU: 1
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: EIP: 0060:[<f894c448>] Not tainted VLI
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: EFLAGS: 00010286 (2.6.23.15-80.fc7 #1)
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: EIP is at zt_tc_ioctl+0x247/0x313 [zttranscode]
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: eax: 00000000 ebx: f78b2064 ecx: c0044a5d edx: 00000002
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: esi: f64f40c0 edi: 00000002 ebp: f79c6920 esp: f668398c
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: Process asterisk (pid: 4271, ti=f6683000 task=f65eac20 task.ti=f6683000)
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: Stack: 00000000 f6f47900 f64f40c0 f78b2064 00000001 00000019 00700000 00701000
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: f66839f8 00000025 08100073 f6683a11 00000000 00000000 f66839c4 c201a1d0
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: f65ea63c c201a180 00000004 c2044a11 f7aec144 b77beae8 f6ea8cc0 f8a34d88
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
-- Hungup 'Zap/83-1'l Trace:
== Spawn extension (macro-expdialh6, s, 5) exited non-zero on 'IAX2/centgw112-52' in macro 'expdialh6'
== Spawn extension (macro-expdialh6, s, 5) exited non-zero on 'IAX2/centgw112-52'
-- Hungup 'IAX2/centgw112-52'
ele_hond_245*CLI>
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<f8a34d88>] zt_chan_ioctl+0x652/0x673 [zaptel]
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<f8a35f13>] zt_ioctl+0x116a/0x13ad [zaptel]
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c044173e>] getnstimeofday+0x30/0xbe
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c041cb3e>] lapic_next_event+0xc/0x10
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0443646>] clockevents_program_event+0xb5/0xbc
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0425d43>] enqueue_entity+0x2dd/0x307
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0425797>] __check_preempt_curr_fair+0x55/0x86
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0425d43>] enqueue_entity+0x2dd/0x307
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0425a3f>] dequeue_entity+0xa4/0xcb
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0425da8>] task_tick_fair+0x3b/0x60
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c044173e>] getnstimeofday+0x30/0xbe
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c041cb3e>] lapic_next_event+0xc/0x10
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c0443646>] clockevents_program_event+0xb5/0xbc
Message from syslogd@ at Sat Mar 22 16:10:47 2008 ...
ele_hond_245 kernel: [<c044437a>] tick_program_event+0x33/0x52
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0440426>] hrtimer_interrupt+0x192/0x1bc
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c04445de>] tick_sched_timer+0x0/0xbb
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0431ccc>] irq_exit+0x53/0x6b
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c041d05a>] smp_apic_timer_interrupt+0x71/0x7d
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c061d3bc>] _read_lock_bh+0x8/0x17
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0405c2c>] apic_timer_interrupt+0x28/0x30
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c061007b>] xfrm_send_policy_notify+0x3d7/0x4fd
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c043d55c>] remove_wait_queue+0x16/0x22
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c048bafa>] free_poll_entry+0xe/0x16
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c048bb1a>] poll_freewait+0x18/0x4c
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c048be52>] do_sys_poll+0x304/0x329
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c048c77b>] __pollwait+0x0/0xac
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0427c5b>] default_wake_function+0x0/0xc
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0427c5b>] default_wake_function+0x0/0xc
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c044173e>] getnstimeofday+0x30/0xbe
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c041cb3e>] lapic_next_event+0xc/0x10
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0443646>] clockevents_program_event+0xb5/0xbc
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c044437a>] tick_program_event+0x33/0x52
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0440426>] hrtimer_interrupt+0x192/0x1bc
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c04445de>] tick_sched_timer+0x0/0xbb
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0431ccc>] irq_exit+0x53/0x6b
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c041d05a>] smp_apic_timer_interrupt+0x71/0x7d
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0425095>] update_stats_wait_end+0xd3/0xfe
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c061d679>] __reacquire_kernel_lock+0x2f/0x4b
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c04656d7>] __rmqueue+0x5e/0xac
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c041c55f>] apic_wait_icr_idle+0xe/0x15
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0425d43>] enqueue_entity+0x2dd/0x307
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c041ac3d>] native_smp_send_reschedule+0x5f/0x64
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0425797>] __check_preempt_curr_fair+0x55/0x86
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0425733>] resched_task+0x55/0x58
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c042ab43>] check_preempt_curr_fair+0x6b/0x71
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c0427c51>] try_to_wake_up+0x2ef/0x2f9
Message from syslogd@ at Sat Mar 22 16:10:48 2008 ...
ele_hond_245 kernel: [<c04f5928>] copy_to_user+0x34/0x48
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<f8a2ef7c>] zt_chan_read+0x1e0/0x209 [zaptel]
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c04256b4>] update_curr+0x13d/0x167
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c046a9d6>] vma_prio_tree_insert+0x17/0x2a
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c0471173>] vma_link+0xa5/0xc3
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c0425095>] update_stats_wait_end+0xd3/0xfe
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c04041be>] __switch_to+0xcb/0x149
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c048b38d>] do_ioctl+0x4d/0x63
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c048b5da>] vfs_ioctl+0x237/0x249
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
ele_hond_245 kernel: [<c048b638>] sys_ioctl+0x4c/0x64
Message from syslogd@ at Sat Mar 22 16:10:49 2008 ...
By: Sergio Serrano (srsergio) 2008-04-03 10:55:05

Hi all,

we have noticed this problem in PCI express and PCI cards with asterisk 1.4.18.1, 1.4.18 and 1.2.24. We are trying T309 and hardhdlc but nothing occurs. We have compiled 1.4.18.1 with autoservice.c revision 105410 but the problem persist.It's only an E1 problem? Our system is HP ML115, HP DL360 and Dell PE1950 with TE122p, TE410P and TE121B cards, with Primaries of different Telcos and Debian and Fedora Linux Operating System with Kernel 2.6.
By: Jeff Peeler (jpeeler) 2008-06-30 18:21:54

Just wanted to check and see if this problem is still occurring. If so, a new backtrace against the latest release using an unoptimized build would be very helpful. Also a dmesg output if the kernel is oopsing.

By: Jeff Peeler (jpeeler) 2008-07-01 12:45:46

I haven't been able to get Asterisk to crash yet, but I do see the massive number of rejected frame error messages. When the errors start occurring the CPU usage does spike in some cases all the way to 100 percent.
By: ibercom (ibercom) 2008-07-27 14:25:33

I see a lot of rejected frame error messages when there are more than 14 - 16 calls. My system never does crash. I have a TE410P card with 3 PRI working, asterisk 1.4.21.1 and zaptel 1.4.11.

== Spawn extension (macro-call-int, s, 5) exited non-zero on 'Zap/32-1' in macro 'call-int'
== Spawn extension (macro-call-int, s, 5) exited non-zero on 'Zap/32-1'
-- Hungup 'Zap/32-1'
[Jul 23 11:47:01] ERROR[514]: chan_zap.c:8250 zt_pri_error: !! Got reject for frame 120, retransmitting frame 120 now, updating n_r!
[Jul 23 11:47:01] ERROR[514]: chan_zap.c:8250 zt_pri_error: !! Got reject for frame 120, retransmitting frame 121 now, updating n_r!

Normally the errors take place when finalizing the call (see above) and sometimes at the beginning.
By: Jeff Peeler (jpeeler) 2008-08-01 10:44:23

ibercom: Can you try seeing if you still get the rejected frame messages with echo cancellation turned off? Also report back if you were using software or hardware cancellation previously (assuming you were at all).
By: Jeff Peeler (jpeeler) 2008-08-04 15:29:55

After lots of investigation, it has been determined that the rejected frames (or retransmissions for SS7) are caused by IRQ misses occurring in DAHDI and depends on the system load and echo cancellation. This behavior is expected so I'm closing this bug as the other described issues were already fixed.