[Home]

Summary:ASTERISK-13113: Zaptel/DAHDi freezes E1 PRI channels when recieving fax.
Reporter:Vadim Sherbakov (vinsik)Labels:
Date Opened:2008-11-24 07:23:53.000-0600Date Closed:2009-06-25 16:25:02
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_dahdi
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) dahdi_hang_gdb_threads_08-12-18.txt
Description:Asterisk locks up when trying to send fax from E1 PRI to SIP ATA device. (Outbound works)
CLI stops responding when you try to access DAHDi or channel related info.
Also 2M link is dropped down when this lock occurs.

Only asterisk restart helps.

This is 'core show locks' output:

=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3079187344 (pri_dchannel         started at [ 9697] chan_dahdi.c start_pri())
=== ---> Lock #0 (chan_dahdi.c): MUTEX 8678 pri_dchannel &pri->lock 0xb7940e24 (1)
=== ---> Waiting for Lock #1 (chan_dahdi.c): MUTEX 9424 pri_dchannel &pri->pvts[chanpos]->lock 0x821b6f8 (1)
=== --- ---> Locked Here: chan_dahdi.c line 4753 (dahdi_read)
=== -------------------------------------------------------------------
===
=== Thread ID: 3069733776 (autoservice_run      started at [  228] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (channel.c): MUTEX 1682 ast_waitfor_nandfds &c[x]->lock 0x8281548 (1)
=== --- ---> Locked Here: channel.c line 1978 (__ast_read)
=== -------------------------------------------------------------------
===
=== Thread ID: 3069979536 (ss_thread            started at [ 9060] chan_dahdi.c pri_dchannel())
=== ---> Lock #0 (channel.c): MUTEX 1978 __ast_read &chan->lock 0x8281548 (1)
=== ---> Lock #1 (chan_dahdi.c): MUTEX 4753 dahdi_read &p->lock 0x821b6f8 (1)
=== ---> Lock #2 (pbx.c): RDLOCK 6127 ast_rdlock_contexts &conlock 0x81bde40 (1)
=== -------------------------------------------------------------------

Have tried asterisk-1.4.23-rc1 and DAHDi (linux/tools) 2.1.0-rc4.

I can provide more information if needed.
Comments:By: Vadim Sherbakov (vinsik) 2008-11-24 07:34:57.000-0600

Most important thing.

Versions i have tried:
asterisk-1.4.20.1, asterisk-1.4.22 and asterisk-1.4.23-rc1.
Also, zaptel-1.4.9, zaptel-1.4.11, DAHDi 2.0.0 (linux/tools)

All of these result in the same problem.

But older relases:
asterisk-1.2.26, asterisk-1.2.30.2 with zaptel 1.2.27 work fine.

By: kreijnen (kreijnen) 2008-11-24 14:44:24.000-0600

Same thing happening here with Asterisk 1.4.21.2 Zaptel 1.4.12.1 but only with option faxdetect enabled in zapata.conf, either faxdetect=both or incoming or outgoing.

Faxdetect disabled does work.

Faxes are being received with RxFax/SpanDSP.

Other installation with Asterisk 1.4.21.2 Zaptel 1.4.11 does work like a charm, no matter what option is set for faxdetect.

By: Vadim Sherbakov (vinsik) 2008-11-24 15:36:58.000-0600

kreijnen, Thanx, i will try without faxdetect.

Btw.
- Do you have a context [fax] in your dialplan when faxdetect is enabled?
- I seem to have the wrong idea of what faxdetect is suppose to do. Is it's only function to send faxes to context [fax] or maybe do some echo cancelation disabing, etc..?

By: kreijnen (kreijnen) 2008-11-24 15:40:07.000-0600

In my opinion this option does both.

Because when I have no context for fax then it only disables echo cancellation.

By: kreijnen (kreijnen) 2008-11-24 15:50:30.000-0600

Here is the output from Asterisk 1.4.22 Dahdi 2.0

Also hangs.


   -- Accepting call from '' to '455268146' on channel 0/1, span 3
[Nov 24 22:43:20] DEBUG[3823]: chan_dahdi.c:1485 dahdi_enable_ec: Enabled echo cancellation on channel 63
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1842 pbx_extension_helper: Launching 'Set'
   -- Executing [455268146@mor_pstn_local:1] Set("Zap/63-1", "CDR(ACCOUNTCODE)=3") in new stack
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1842 pbx_extension_helper: Launching 'Goto'
   -- Executing [455268146@mor_pstn_local:2] Goto("Zap/63-1", "mor_pstn1|455268146|1") in new stack
   -- Goto (mor_pstn1,455268146,1)
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1842 pbx_extension_helper: Launching 'Set'
   -- Executing [455268146@mor_pstn1:1] Set("Zap/63-1", "CDR(ACCOUNTCODE)=3") in new stack
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1842 pbx_extension_helper: Launching 'Goto'
   -- Executing [455268146@mor_pstn1:2] Goto("Zap/63-1", "mor|31455268146|1") in new stack
   -- Goto (mor,31455268146,1)
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1691 pbx_substitute_variables_helper_full: Function result is ''
   -- Executing AGI("Zap/63-1", "mor-recordings.php||31455268146")
   -- Launched AGI Script /var/lib/asterisk/agi-bin/mor-recordings.php
 ==  mor-recordings.php||31455268146: Starting Recording script for MOR
 ==  mor-recordings.php||31455268146: Global CID Name: , User CID Name:  
 ==  mor-recordings.php||31455268146: Src: , dst: 31455268146, src device_id: , dst device_id: 195,
 ==  mor-recordings.php||31455268146: This call will not be recorded
   -- AGI Script mor-recordings.php completed, returning 0
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1691 pbx_substitute_variables_helper_full: Function result is ''
[Nov 24 22:43:20] DEBUG[3836]: pbx.c:1759 pbx_substitute_variables_helper_full: Expression result is '0'
   -- Executing GotoIf("Zap/63-1", "0?3:5")
   -- Goto (mor,31455268146,5)
   -- Executing mor("Zap/63-1", "31455268146")
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:3328 mor_get_variables: Tell balance: 0, tell time: 0, tell time mod.: 1.000000, tell r.time when left: 0 s, tell r.time every: 60 s, card_id: 0, callback: 0
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:452 mor_authorize: Date: 2008-11-24, time: 22:43:20
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:503 mor_authorize: Src: , Dst: 31455268146, Acc/dev_id: 3, CID:
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:2728 mor_get_user_by_acc: User's data retrieved: id: 4, lcr_id: 2, balance: 0.000000, frozen_balance: 0.000000, credit: -1.000000, blocked: 0, lcr_order: price, tariff_id: 2, postpaid: 1, cut: , add: , day type: WD, exchange rate: 1.000000, ani: 0, tell balance: 0, time: 0, when left: 60s, every: 60s, user currency: USD, def.curr: EUR, tariff type: user
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:546 mor_authorize: Caller type: Local
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:558 mor_authorize: Localized destination: 31455268146
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:2767 mor_check_did: Checking DID...
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:2878 mor_check_did: DID status: active, Provider rate 0.000000, con. fee: 0.000000, inc: 1, min time: 0, prov.id: 3, Inc. rate 0.000000, con. fee: 0.000000, inc: 1, min time: 0, Owner. rate 0.000000, con. fee: 0.000000, inc: 1, min time: 0, language: en, dialplan id: 0
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:2884 mor_check_did: DID DEVICE id: 195, type = FAX, extension: 1013, is trunk? 0, name: 1013, USER id: 36, postpaid: 1, balance: -1222.362305, frz_balance: 0.000000, credit: -1.000000, callflows: 0
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:728 mor_authorize: Timeout: 36000, user_id: 4, callertype: Local, anitype: , postpaid: 1
[Nov 24 22:43:20] NOTICE[3836]: app_mor.c:1681 mor_exec: Dial string using DID: 'Local/1013@mor/n||L(36000000:60000:60000)'
   -- Limit Data for this call:
      > timelimit      = 36000000
      > play_warning   = 60000
      > play_to_caller = yes
      > play_to_callee = no
      > warning_freq   = 60000
      > start_sound    = (null)
      > warning_sound  = timeleft
      > end_sound      = (null)
[Nov 24 22:43:20] DEBUG[3836]: rtp.c:1585 ast_rtp_make_compatible: Channel 'Local/1013@mor-b205,1' has no RTP, not doing anything
[Nov 24 22:43:20] DEBUG[3836]: channel.c:3338 ast_channel_inherit_variables: Copying soft-transferable variable MOR_CALL_FROM_DID.
[Nov 24 22:43:20] DEBUG[3836]: channel.c:3351 ast_channel_inherit_variables: Not copying variable AGISTATUS.
[Nov 24 22:43:20] DEBUG[3836]: channel.c:3351 ast_channel_inherit_variables: Not copying variable mor_cid_name.
[Nov 24 22:43:20] DEBUG[3836]: channel.c:3351 ast_channel_inherit_variables: Not copying variable CALLEDTON.
[Nov 24 22:43:20] DEBUG[3836]: channel.c:3351 ast_channel_inherit_variables: Not copying variable ANI2.
[Nov 24 22:43:20] DEBUG[3836]: channel.c:3351 ast_channel_inherit_variables: Not copying variable TRANSFERCAPABILITY.
   -- Called 1013@mor/n
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1691 pbx_substitute_variables_helper_full: Function result is '0'
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1759 pbx_substitute_variables_helper_full: Expression result is '0'
   -- Executing GotoIf("Local/1013@mor-b205,2", "0?2:4")
   -- Goto (mor,1013,4)
   -- Executing Set("Local/1013@mor-b205,2", "CALLED_TO=1013")
   -- Executing Set("Local/1013@mor-b205,2", "MOR_FAX_ID=195")
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1691 pbx_substitute_variables_helper_full: Function result is ''
   -- Executing Set("Local/1013@mor-b205,2", "FAXSENDER=")
   -- Executing Goto("Local/1013@mor-b205,2", "mor_fax2email|1013|1")
   -- Goto (mor_fax2email,1013,1)
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1842 pbx_extension_helper: Launching 'Answer'
   -- Executing [1013@mor_fax2email:1] Answer("Local/1013@mor-b205,2", "") in new stack
[Nov 24 22:43:20] DEBUG[3838]: devicestate.c:302 ast_device_state_changed_literal: Notification of state change to be queued on device/channel Local/1013@mor
[Nov 24 22:43:20] DEBUG[3836]: devicestate.c:302 ast_device_state_changed_literal: Notification of state change to be queued on device/channel Local/1013@mor
   -- Local/1013@mor-b205,1 answered Zap/63-1
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1842 pbx_extension_helper: Launching 'StopPlayTones'
   -- Executing [1013@mor_fax2email:2] StopPlayTones("Local/1013@mor-b205,2", "") in new stack
[Nov 24 22:43:20] DEBUG[3804]: chan_local.c:145 local_devicestate: Checking if extension 1013@mor exists (devicestate)
[Nov 24 22:43:20] DEBUG[3836]: rtp.c:1502 ast_rtp_early_bridge: Channel 'Zap/63-1' has no RTP, not doing anything
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1842 pbx_extension_helper: Launching 'Playback'
   -- Executing [1013@mor_fax2email:3] Playback("Local/1013@mor-b205,2", "beep") in new stack
[Nov 24 22:43:20] DEBUG[3836]: devicestate.c:302 ast_device_state_changed_literal: Notification of state change to be queued on device/channel Zap/63
[Nov 24 22:43:20] DEBUG[3836]: chan_dahdi.c:5154 dahdi_indicate: Requested indication 20 on channel Zap/63-1
[Nov 24 22:43:20] DEBUG[3838]: channel.c:2843 set_format: Set channel Local/1013@mor-b205,2 to write format gsm
[Nov 24 22:43:20] DEBUG[3804]: devicestate.c:287 do_state_change: Changing state for Local/1013@mor - state 2 (In use)
[Nov 24 22:43:20] DEBUG[3804]: chan_local.c:145 local_devicestate: Checking if extension 1013@mor exists (devicestate)
[Nov 24 22:43:20] DEBUG[3838]: channel.c:1832 ast_settimeout: Scheduling timer at 160 sample intervals
   -- <Local/1013@mor-b205,2> Playing 'beep' (language 'nl')
[Nov 24 22:43:20] DEBUG[3820]: app_queue.c:661 handle_statechange: Device 'Local/1013@mor' changed to state '2' (In use) but we don't care because they're not a member of any queue.
[Nov 24 22:43:20] DEBUG[3804]: devicestate.c:287 do_state_change: Changing state for Local/1013@mor - state 2 (In use)
[Nov 24 22:43:20] DEBUG[3804]: devicestate.c:287 do_state_change: Changing state for Zap/63 - state 2 (In use)
[Nov 24 22:43:20] DEBUG[3820]: app_queue.c:661 handle_statechange: Device 'Local/1013@mor' changed to state '2' (In use) but we don't care because they're not a member of any queue.
[Nov 24 22:43:20] DEBUG[3820]: app_queue.c:661 handle_statechange: Device 'Zap/63' changed to state '2' (In use) but we don't care because they're not a member of any queue.
[Nov 24 22:43:20] DEBUG[3838]: channel.c:1832 ast_settimeout: Scheduling timer at 0 sample intervals
[Nov 24 22:43:20] DEBUG[3838]: channel.c:1832 ast_settimeout: Scheduling timer at 0 sample intervals
[Nov 24 22:43:20] DEBUG[3838]: channel.c:2843 set_format: Set channel Local/1013@mor-b205,2 to write format alaw
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1691 pbx_substitute_variables_helper_full: Function result is '1227563000.5'
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1842 pbx_extension_helper: Launching 'Set'
   -- Executing [1013@mor_fax2email:4] Set("Local/1013@mor-b205,2", "FAXFILE=1227563000.5") in new stack
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1842 pbx_extension_helper: Launching 'Set'
   -- Executing [1013@mor_fax2email:5] Set("Local/1013@mor-b205,2", "FAX_DISABLE_V17=1") in new stack
[Nov 24 22:43:20] DEBUG[3838]: pbx.c:1842 pbx_extension_helper: Launching 'RxFax'
   -- Executing [1013@mor_fax2email:6] RxFax("Local/1013@mor-b205,2", "/var/spool/asterisk/faxes/1227563000.5.tif|ecm") in new stack
[Nov 24 22:43:20] DEBUG[3838]: channel.c:2843 set_format: Set channel Local/1013@mor-b205,2 to read format slin
[Nov 24 22:43:20] DEBUG[3838]: channel.c:2843 set_format: Set channel Local/1013@mor-b205,2 to write format slin
[Nov 24 22:43:20] DEBUG[3838]: /usr/src/agx-ast-addons/app_rxfax.c:384 rxfax_exec: Enabling ECM mode for app_rxfax
[Nov 24 22:43:21] DEBUG[3836]: chan_dahdi.c:3706 dahdi_handle_dtmfup: DTMF digit: f on Zap/63-1
[Nov 24 22:43:21] DEBUG[3806]: chan_sip.c:2174 __sip_autodestruct: Auto destroying SIP dialog '0zsm6q@london.domain.com'
[Nov 24 22:43:21] DEBUG[3806]: chan_sip.c:3434 sip_destroy: Destroying SIP dialog 0zsm6q@london.domain.com
Really destroying SIP dialog '0zsm6q@london.domain.com' Method: OPTIONS
[Nov 24 22:43:47] DEBUG[3828]: channel.c:1088 channel_find_locked: Avoiding deadlock for channel '0x82d4660'

By: Vadim Sherbakov (vinsik) 2008-11-24 16:01:30.000-0600

kreijnen, maybe that is causing the problem, i mean not having extension or context called [fax]. I have to test this tomorrow. Thank you for all your input and for all the fish. :-)

By: kreijnen (kreijnen) 2008-11-24 16:03:03.000-0600

We have tried both...

Same result.

By: Russell Bryant (russell) 2008-12-11 15:39:59.000-0600

Can you get a gdb backtrace of all threads when it locks up like this?

# gdb asterisk `pidof asterisk`
(gdb) thread apply all bt

By: Vadim Sherbakov (vinsik) 2008-12-18 01:45:41.000-0600

russell, done.
Tested with asterisk-1.4.23-rc1 and dahdi-complete 2.1.0
other end fax adapter is AUDIOCODES and i am using 4-port PRI E1 card.

By: Vadim Sherbakov (vinsik) 2008-12-18 02:26:18.000-0600

damn, i forgot. Do you need DONT_OPTIMIZE enabled when compiling?

By: Dwayne Hubbard (dhubbard) 2009-01-19 17:54:58.000-0600

vinsik,
 Would you mind providing a link to the source location of your app_rxfax and app_txfax modules ?

By: Vadim Sherbakov (vinsik) 2009-01-20 02:15:28.000-0600

Im not using app_rxfax and app_txfax modules.
DAHDi/asterisk hangs because of this Audiocodes fax-adapter.

But i compiled them though, and the package is agx-addons for asterisk.
http://sourceforge.net/projects/agx-ast-addons/

By: Vadim Sherbakov (vinsik) 2009-04-06 07:14:37

What is the status on this? Does anybody have ideas how to fix this?

I did test this by adding [fax] context, and it worked by going to
fax-context. But i need it to work in the same context.

Something goes wrong while waiting for non-existing [fax] context?!

By: Arnd Schmitter (arnd) 2009-05-25 04:59:01

i've have the same problem, but i know when the problem started.

I'm running 1.4.18.1. As it has proven stable for me.

The Problem has come, after I tried to upgrade to 1.4.19. So I downgraded back.

Later I've encountered the bug 12470

So I backported the Patch which resolved 12470 into 1.4.18. After that, the Fax-Channellock-Problem was back.

By: Leif Madsen (lmadsen) 2009-05-26 09:44:04

arnd: are you saying that you don't have this issue with 1.4.18, but you do have it with 1.4.19, and you only get this issue in 1.4.18 with the backport of the commit from 12470?

Perhaps that is the issue that introduced the issue?

vinsik: any chance you could try with 1.4.18 and see if the issue is not present there? If so, then perhaps the issue is between 1.4.18 and 1.4.19?

By: Arnd Schmitter (arnd) 2009-05-26 10:06:31

Yes, under vanilla 1.4.18, I don't encounter this Problem.

Under 1.4.19 it happens instantly.
To reproduce:
I made a Faxcall (in my examples always inbound Zap=>iaxmodem/hylafax). During the call I enter "core show channels" in a cli session.
The cli shows then the IAX Channel, but hangs before it shows the Zap Channel.

BTW:
I haven't really used the feature faxdetect in zapata.conf. But from my development days, this option was set, as I played with it.
I don't have a fax extension. Maybe that's somehow related.

By: Vadim Sherbakov (vinsik) 2009-05-27 05:46:18

Judging by backtrace, i would say that its waiting for the fax extension.
or some instance to create it, but not sure.

Testing now is hard, because im running a live system.
(Currently running 1.2.30.1 with zaptel 1.2).

But i will try to do it some night and report.

It could be great, if one of the coders could check
why autoservice thread is not rebuilding its list.

Looking at backtrace, it seems to hang here:
@asterisk 1.4.23-rc1/main/autoservice.c:287

285:        /* Wait while autoservice thread rebuilds its list. */
286:        while (chan_list_state == as_chan_list_state) {
287:                usleep(1000);
288:        }

By: Vadim Sherbakov (vinsik) 2009-05-27 06:10:05

im going to try with my edit of autoservice.c:

/* Wait while autoservice thread rebuilds its list. */
int k=3;
while (chan_list_state == as_chan_list_state) {
usleep(1000);
if(--k==0) break;
}

so there is no infinite loop.

BTW. This part of the code appeard for the first time in 1.4.19. (1.4.18.1 had no such func in autoservice.c)



By: Dmitry Andrianov (dimas) 2009-05-27 06:12:03

Well it does not rebuild the list because autoservice_run most likely stuck in ast_waitfor_n:

=== Thread ID: 3069733776 (autoservice_run started at [ 228] autoservice.c ast_autoservice_start())
=== ---> Waiting for Lock #0 (channel.c): MUTEX 1682 ast_waitfor_nandfds &c[x]->lock 0x8281548 (1)
=== --- ---> Locked Here: channel.c line 1978 (__ast_read)

I do not think skipping list rebuild will help - because the channel is definitely in some bad state (kind of deadlock) so can not be read.



By: Dmitry Andrianov (dimas) 2009-05-27 06:40:37

Well... Honestly I do not know how exactly async_goto works but my imagination creates following picture for me:

1. for some reason autoservice is started on the channel
2. while in autoservice channels is bening read by autoservice thread
3. On one of reads, chan_dahdi detects CNG fax tone
4. chan_dahdi requests ast_async_goto for the channel to jump into fax extension
5. What happens next i not quite understand :) but I believe that channel starts being read by another pbx thread simultaneously with autoservice_run. And that causes problem and does not allow autoservice_run to continue. (and rebuild the list which in turn blocks autoservice_stop from the first thread)

I think if you disable fax detection it should work ok.



By: Vadim Sherbakov (vinsik) 2009-05-27 06:44:41

You are right, if i disable fax detection, * does not hang.

But, without fax detection the * does not disable echo cancellation,
thus faxes do not come through anyway. So i kind of need fax detection.

And i do not want to disable echo cancellation all together because
of faxes. :)

Besides, i want to support * crew to build a firm and reliable software.
So lets fix this one :)



By: Dmitry Andrianov (dimas) 2009-05-27 07:24:23

i'm not saying this one should not be fixed :) I'm just proposing a workaround you can use.

Anyway, looking at the chan_dahdi code I see NO code what does anything beside jumping into "fax" extension when fax is detected. So I do not think Asterisk actually disables echo cancellation there.

Are you sure your problem is not something from ZAP-346 ?

By: David Vossel (dvossel) 2009-06-25 16:25:01

This was fixed by r166380 in 1.4. A channel lock was held while trying to run autoservice.  Updating to 1.4.23 or higher should fix this, it just happened to be that 1.4.23-rc1 did not have this fix included.