[Home]

Summary:ASTERISK-15927: Playback deadlock?
Reporter:robinfood (robinfood)Labels:
Date Opened:2010-04-08 12:24:04Date Closed:2010-12-10 11:07:53.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Applications/app_playback
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) dialout.txt
( 1) issue_0017151_full_log.8
Description:My issue is very close to the issue id 0014412.

I'm using the Local channel to make a dialout calls.
I'm using the Playback application to play a gsm-file.
Sometimes (approximately 1 from 10 Playback calls) Playback application stops to play after 1-5 seconds of a voice. Next-priority extensions does not start. MixMonitor records a silence from * and a voice from a called party.

****** ADDITIONAL INFORMATION ******

Here is an example of a .call file:
Channel: Local/199@inbound/n
CallerID: "199" <199>
Context: dialtest
Extension: s
Priority: 1

Here is the part of the extensions.conf file:
[inbound]
exten => 199,1,Dial(SIP/alc/199,,T)

[dialtest]
exten => s,1,Playback(/etc/asterisk/sounds/dialtest/2)
exten => s,n,Hangup()

Here is an example of a verbose logging for a successful call:
   -- Attempting call on Local/199@inbound/n for s@dialtest:1 (Retry 1)
   -- Executing [199@inbound:1] Dial("Local/199@inbound-d1cb;2", "SIP/alc/199,,T") in new stack
 == Using SIP RTP CoS mark 5
   -- Called alc/199
   -- SIP/alc-000168e5 is ringing
   -- SIP/alc-000168e5 is making progress passing it to Local/199@inbound-d1cb;2
   -- SIP/alc-000168e5 answered Local/199@inbound-d1cb;2
      > Channel Local/199@inbound-d1cb;1 was answered.
   -- Executing [s@dialtest:1] Playback("Local/199@inbound-d1cb;1", "/etc/asterisk/sounds/dialtest/2") in new stack
   -- <Local/199@inbound-d1cb;1> Playing '/etc/asterisk/sounds/dialtest/2.gsm' (language 'en')
   -- Executing [s@dialtest:2] Hangup("Local/199@inbound-d1cb;1", "") in new stack
 == Spawn extension (dialtest, s, 2) exited non-zero on 'Local/199@inbound-d1cb;1'
[Apr  8 17:42:03] NOTICE[18540]: pbx_spool.c:349 attempt_thread: Call completed to Local/199@inbound/n
 == Spawn extension (inbound, 199, 1) exited non-zero on 'Local/199@inbound-d1cb;2'

Here is an example of a verbose logging for an unsuccesful call:
   -- Attempting call on Local/199@inbound/n for s@dialtest:1 (Retry 1)
   -- Executing [199@inbound:1] Dial("Local/199@inbound-bf97;2", "SIP/alc/199,,T") in new stack
 == Using SIP RTP CoS mark 5
   -- Called alc/199
   -- SIP/alc-0001691c is ringing
   -- SIP/alc-0001691c is making progress passing it to Local/199@inbound-bf97;2
   -- SIP/alc-0001691c answered Local/199@inbound-bf97;2
      > Channel Local/199@inbound-bf97;1 was answered.
   -- Executing [s@dialtest:1] Playback("Local/199@inbound-bf97;1", "/etc/asterisk/sounds/dialtest/2") in new stack
   -- <Local/199@inbound-bf97;1> Playing '/etc/asterisk/sounds/dialtest/2.gsm' (language 'en')
 == Spawn extension (inbound, 199, 1) exited non-zero on 'Local/199@inbound-bf97;2'
 == Spawn extension (dialtest, s, 1) exited non-zero on 'Local/199@inbound-bf97;1'
[Apr  8 17:46:50] NOTICE[23198]: pbx_spool.c:349 attempt_thread: Call completed to Local/199@inbound/n

Here is a channel status while Playback is playing a silence:
core show channel Local/199@inbound-bf97;1
-- General --
          Name: Local/199@inbound-bf97;1
          Type: Local
      UniqueID: 1270737938.103987
     Caller ID: 199
Caller ID Name: 199
   DNID Digits: (N/A)
      Language: en
         State: Up (6)
         Rings: 0
 NativeFormats: 0x40 (slin)
   WriteFormat: 0x2 (gsm)
    ReadFormat: 0x40 (slin)
WriteTranscode: Yes
 ReadTranscode: No
1st File Descriptor: -1
     Frames in: 2801
    Frames out: 25
Time to Hangup: 0
  Elapsed Time: 0h0m57s
 Direct Bridge: <none>
Indirect Bridge: <none>
--   PBX   --
       Context: dialtest
     Extension: s
      Priority: 1
    Call Group: 0
  Pickup Group: 0
   Application: Playback
          Data: /etc/asterisk/sounds/dialtest/2
   Blocking in: ast_waitfor_nandfds
 CDR Variables:
level 1: clid="199" <199>
level 1: src=199
level 1: dst=s
level 1: dcontext=dialtest
level 1: channel=Local/199@inbound-bf97;1
level 1: lastapp=Playback
level 1: lastdata=/etc/asterisk/sounds/dialtest/2
level 1: start=2010-04-08 17:45:38
level 1: duration=56
level 1: billsec=0
level 1: disposition=NO ANSWER
level 1: amaflags=DOCUMENTATION
level 1: uniqueid=1270737938.103987

Comments:By: Paul Belanger (pabelanger) 2010-04-08 14:19:39

Downgrading Severity.

By: Leif Madsen (lmadsen) 2010-04-12 09:47:58

Debugging deadlocks:

Please select DEBUG_THREADS and DONT_OPTIMIZE in the Compiler Flags section of menuselect. Recompile and install Asterisk (i.e. make install)

This will then give you the console command:

core show locks

When the symptoms of the deadlock present themselves again, please provide output of the deadlock via:

# asterisk -rx "core show locks" | tee /tmp/core-show-locks.txt

# gdb -se "asterisk" <pid of asterisk> | tee /tmp/backtrace.txt

gdb> bt
gdb> bt full
gdb> thread apply all bt

Then attach the core-show-locks.txt and backtrace.txt files to this issue. Thanks!

By: robinfood (robinfood) 2010-04-13 16:14:03

Thank you.

I'm attached the output of the "core show locks" command and the output of the gdb.

Looks strange that "core show locks" command shows nothing - maybe I did something wrong, or maybe there was no lock? But the playback was stopped about ten minutes and there was no progress until the called party hangup.

By: Paul Belanger (pabelanger) 2010-04-13 16:47:02

I don't see anything of interest with the log you provided.  What we really need is a full debug log (not console output) of the issue.

http://svn.digium.com/svn/asterisk/trunk/doc/HOWTO_collect_debug_information.txt

By: robinfood (robinfood) 2010-04-13 18:57:31

Thank you.

I'm attached the full debug log.
There was an one little difference between this and previous problems: now there was no voice after starting a playback - only a silence (maybe there was a voice, but is was very short in this case, less than one second).

Sorry, but there are some other calls in the log, which may not affect this issue. I leaved them in the full log because I may be wrong, and these other calls really may affect this issue. The problematic playback was started at 01:22:25.

Sorry again, but I replaced all IP-addresses and all phone numbers by different placeholders (i.e. 'aXX.bXX.cXX.dXX' instead of real IPs, and 'phoneX' instead of real phone numbers - only the phone number 790 left unchanged). IP of my asterisk was replaced by a01.b01.c01.d01, and IP of the 'alc' peer was replaced by 'a02.b02.c02.d02'

By: Leif Madsen (lmadsen) 2010-04-15 12:16:32

Well this is looking less and less like a deadlock. If you hang up the channel, does the channel go away in memory?

By: robinfood (robinfood) 2010-04-15 12:44:39

How can I check is the channel go away or still remain in the memory?

I think it goes away, because on the called party hangup there is the log messages about hanging up both parties: at 01:23:39 there is the next messages in the last attached log:
[Apr 14 01:23:39] DEBUG[16523] channel.c: Hanging up channel 'Local/790@inbound-98b8;2'
[Apr 14 01:23:39] DEBUG[16522] channel.c: Hanging up channel 'Local/790@inbound-98b8;1'

Maybe I used incorrect word to describe this issue, and this is really not a deadlock, but how should I call it to be correct?

By: Leif Madsen (lmadsen) 2010-04-15 13:00:16

'core show channels'

I'd also attach your dialplan and a way of reproducing this issue. You may also want to simplify the scenario down to the point where you can't reproduce the issue, and then add one additional element to determine which part is actually causing the problem.

I have a feeling there might be something going on with the Local channel. Perhaps the Playback() is in the wrong spot and the channel executing playback is being optimized out of the call (just throwing out a guess right now).

By: robinfood (robinfood) 2010-04-15 16:38:53

There is no channel in the memory after hangup.

This appears to be not dialout-specific - dialin has the same problem. Moreover, the problem remains after switching from SIP to H.323, so it's not SIP-specific.

Here is the simplified dialplan to reproduce the issue:
[Alcatel]
exten => 791,1,Dial(Local/791@dialtest2)
exten => 791,n,Hangup()
[dialtest2]
exten => 791,1,Playback(/etc/asterisk/sounds/dialtest/2)
exten => 791,n,Hangup()
The way to reproduce: dial from the Alcatel context to the 791 extension several times. One of them will stuck (now it was on 6th retry).

I agree that there may be something wrong with the Local channel - replacing Dial(Local...) with Goto(dialtest2...) resolves problem (or at least reduces its repeat rate - I tried 35 calls with Goto and all 35 calls succeeded).
----------------
Before hangup there was one thing which may be interesting: channel Local/791@dialtest2-88c9;1 was in state 'Up', but channel Local/791@dialtest2-88c9;2 was in state 'Ringing (5)'.

On the other hand, when I encountered this issue for the first time, there were seven sequential Playback calls in the one context - and I remember that
Playback always stuck soon (1-5 seconds) after starting play, even if it was not first Playback in sequence.

By: Leif Madsen (lmadsen) 2010-04-15 18:07:20

Add /n to the end of your Dial() like so:

Dial(Local/791@dialtest2/n)

See if that helps.

Also check out this file:  http://svn.asterisk.org/svn/asterisk/branches/1.6.2/doc/tex/localchannel.tex

By: robinfood (robinfood) 2010-04-16 06:52:39

I tried to use /n and got the same problem.
/nj - does not help too.

I also noticed that one of channels is always in the state 'Ringing (5)', even if there is no problem.

By: Leif Madsen (lmadsen) 2010-04-16 11:31:27

Is this a regression? Does this happen with earlier versions as well?

By: eagvoz (eagvoz) 2010-07-01 09:01:40

I've the same behaivor with asterisk 1.6.2.9.

If we have this context.

[incoming-call]
exten => s,1,NoOp(A new call had been recived)
exten => s,n,PlayBack(/var/lib/sounds/averylargefile.gsm);
exten => s,n,Hangup()

All is working perfect but if we put a Dial Local the playback stops and asterisk don't continue to next priority.

[incoming-call]
exten => s,1,NoOp(A new call han been recived)
exten => s,n,Dial(Local/anotherContext@s/n)
exten => s,n,Hangup()

[anotherContext]
exten => s,1,NoOp(I need this context for billing purposes)
exten => s,n,PlayBack(/var/lib/sounds/averylargefile.gsm)
exten => s,n,Hangup

----------

I've 40 servers with asterisk. All servers have the same versions, but recently I bought 2 new servers where I put the same version of OS and asterisk that I had in my other servers. This problem only happens in those new machines.

By: Tilghman Lesher (tilghman) 2010-12-06 15:32:03.000-0600

This issue appears to be related to a res_timing_pthread issue which was resolved in July by the release of 1.6.2.11.  If you test that version (or greater), can you confirm that you can no longer reproduce this issue?