Summary: | ASTERISK-16711: [patch] Exceptionally long queue length queuing to XXXXX | ||||
Reporter: | Alec Davis (alecdavis) | Labels: | |||
Date Opened: | 2010-09-22 06:18:50 | Date Closed: | 2012-05-04 11:31:19 | ||
Priority: | Major | Regression? | No | ||
Status: | Closed/Complete | Components: | Core/PBX | ||
Versions: | Frequency of Occurrence | ||||
Related Issues: |
| ||||
Environment: | Attachments: | ( 0) expectionally-us.txt ( 1) fix-issue_18020-trunk.diff ( 2) fix-issue_18020-v1.4.patch ( 3) fix-issue_18028-v1.8.1.1.patch ( 4) Longqueue.txt ( 5) putty1.log ( 6) putty2.log | |||
Description: | Although this is under test conditions, this can go on for ever - seems like 100's per second. But the state of code at the moment, allows you to hangup, and it clears. ****** ADDITIONAL INFORMATION ****** Using the following dialplan: and dialling 10030 to create 30 looped localchan calls. [echo-test] exten => s,1,NoOp(${CALLERID(num):-4}) exten => s,n,Answer() exten => s,n,Playback(echo-test) exten => s,n,Echo() exten => s,n,Playback(goodbye) exten => s,n,HangUp() [phones] exten => 10000,1,Goto(echo-test,s,1) exten => _1XXXX,1,Set(i=${MATH(${EXTEN}-1,int)}) exten => _1XXXX,n,Dial(Local/${i}@phones) | ||||
Comments: | By: Alec Davis (alecdavis) 2010-09-22 06:22:15 attached file Longqueue.txt is a 2 second call to 10020. By: Stefan Schmidt (schmidts) 2010-09-22 08:02:52 it looks like there is a maquerading timing issue with this. could you try this with asterisk -vvvvgT started, cause normally you should see a better time description on this (not only sec, also ms) and it would be clearer to see what happens when. By: Alec Davis (alecdavis) 2010-09-23 01:50:08 Log file exceptionally-us.txt with microsecond resolution attached. To get microsecond resolution required a change in logger.conf [general] ; ; Customize the display of debug message time stamps ; this example is the ISO 8601 date format (yyyy-mm-dd HH:MM:SS) ; ; see strftime(3) Linux manual for format specifiers. Note that there is also ; a fractional second parameter which may be used in this field. Use %1q ; for tenths, %2q for hundredths, etc. ; ;dateformat=%F %T ; ISO 8601 date format ;dateformat=%F %T.%3q ; with milliseconds dateformat=%F %T.%6q ; with microseconds By: Stefan Schmidt (schmidts) 2010-09-23 03:00:42 sorry was my fault i mean you should start it with console -cvvvvgT so the time would also be there on the verbose. your log only shows the microseconds after the problem starts. i have tested this by myself on trunk and with 20 or 50 local loops i didnt see a problem. with 300 i got a deadlock. with 1000 after 372 local calls i get a chanunavail. By: Stefan Schmidt (schmidts) 2010-09-23 03:39:22 but this was only sip, i see you have this with dahdi, maybe thats the point why i didnt see this queue length message. By: Alec Davis (alecdavis) 2010-09-24 21:56:08 That deadlock, although under 'testing' conditions, potentially could happen at any time, it's just a matter of time. We need to at least get it documented. For the record, my machine isn't a core2duo: Intel(R) Pentium(R) 4 CPU 2.60GHz (hyper threading enabled) Memtotal: 2068264 When were getting the "Exceptionally long queue", the CPU is at 188% Then the messages stop, seems like it goes back to normal, no deadlocks, no high CPU. But no audio channel! Also many open channels: where they would have normally been optimized out. note: we are not using the '/n' switch. asterix*CLI> core show channels Channel Location State Application(Data) Local/10013@phones-7 (None) Up AppDial((Outgoing Line)) Local/10013@phones-7 10013@phones:2 Up Dial(Local/10012@phones) Local/10005@phones-0 (None) Up AppDial((Outgoing Line)) Local/10005@phones-0 10005@phones:2 Up Dial(Local/10004@phones) Local/10028@phones-2 (None) Up AppDial((Outgoing Line)) Local/10028@phones-2 10028@phones:2 Up Dial(Local/10027@phones) Local/10012@phones-a (None) Up AppDial((Outgoing Line)) Local/10012@phones-a 10012@phones:2 Up Dial(Local/10011@phones) Local/10027@phones-2 10027@phones:2 Up Dial(Local/10026@phones) Local/10027@phones-2 (None) Up AppDial((Outgoing Line)) Local/10033@phones-8 10033@phones:2 Up Dial(Local/10032@phones) Local/10033@phones-8 (None) Up AppDial((Outgoing Line)) Local/10009@phones-1 (None) Up AppDial((Outgoing Line)) Local/10009@phones-1 10009@phones:2 Up Dial(Local/10008@phones) Local/10006@phones-0 (None) Up AppDial((Outgoing Line)) Local/10006@phones-0 10006@phones:2 Up Dial(Local/10005@phones) Local/10018@phones-a (None) Up AppDial((Outgoing Line)) Local/10018@phones-a 10018@phones:2 Up Dial(Local/10017@phones) Local/10032@phones-1 (None) Up AppDial((Outgoing Line)) Local/10032@phones-1 10032@phones:2 Up Dial(Local/10031@phones) Local/10022@phones-6 (None) Up AppDial((Outgoing Line)) Local/10022@phones-6 10022@phones:2 Up Dial(Local/10021@phones) Local/10026@phones-9 (None) Up AppDial((Outgoing Line)) Local/10026@phones-9 10026@phones:2 Up Dial(Local/10025@phones) Local/10040@phones-4 10040@phones:2 Up Dial(Local/10039@phones) Local/10040@phones-4 (None) Up AppDial((Outgoing Line)) Local/10001@phones-8 (None) Up AppDial((Outgoing Line)) Local/10001@phones-8 10001@phones:2 Up Dial(Local/10000@phones) Local/10046@phones-e (None) Up AppDial((Outgoing Line)) Local/10046@phones-e 10046@phones:2 Up Dial(Local/10045@phones) Local/10000@phones-0 s@echo-test:4 Up Echo() Local/10008@phones-4 10008@phones:2 Up Dial(Local/10007@phones) Local/10000@phones-0 (None) Up AppDial((Outgoing Line)) Local/10008@phones-4 (None) Up AppDial((Outgoing Line)) Local/10020@phones-3 10020@phones:2 Up Dial(Local/10019@phones) Local/10020@phones-3 (None) Up AppDial((Outgoing Line)) Local/10047@phones-c 10047@phones:2 Up Dial(Local/10046@phones) Local/10047@phones-c (None) Up AppDial((Outgoing Line)) Local/10045@phones-4 10045@phones:2 Up Dial(Local/10044@phones) Local/10045@phones-4 (None) Up AppDial((Outgoing Line)) Local/10023@phones-8 (None) Up AppDial((Outgoing Line)) Local/10023@phones-8 10023@phones:2 Up Dial(Local/10022@phones) Local/10019@phones-1 10019@phones:2 Up Dial(Local/10018@phones) Local/10019@phones-1 (None) Up AppDial((Outgoing Line)) SIP/bt100black-00000 10050@phones:2 Up Dial(Local/10049@phones) Local/10011@phones-8 (None) Up AppDial((Outgoing Line)) Local/10011@phones-8 10011@phones:2 Up Dial(Local/10010@phones) Local/10016@phones-1 (None) Up AppDial((Outgoing Line)) Local/10029@phones-4 (None) Up AppDial((Outgoing Line)) Local/10016@phones-1 10016@phones:2 Up Dial(Local/10015@phones) Local/10029@phones-4 10029@phones:2 Up Dial(Local/10028@phones) Local/10010@phones-6 10010@phones:2 Up Dial(Local/10009@phones) Local/10010@phones-6 (None) Up AppDial((Outgoing Line)) Local/10025@phones-b (None) Up AppDial((Outgoing Line)) Local/10025@phones-b 10025@phones:2 Up Dial(Local/10024@phones) Local/10007@phones-f (None) Up AppDial((Outgoing Line)) Local/10007@phones-f 10007@phones:2 Up Dial(Local/10006@phones) Local/10017@phones-7 10017@phones:2 Up Dial(Local/10016@phones) Local/10017@phones-7 (None) Up AppDial((Outgoing Line)) Local/10015@phones-e (None) Up AppDial((Outgoing Line)) Local/10015@phones-e 10015@phones:2 Up Dial(Local/10014@phones) Local/10037@phones-4 (None) Up AppDial((Outgoing Line)) Local/10037@phones-4 10037@phones:2 Up Dial(Local/10036@phones) Local/10042@phones-3 (None) Up AppDial((Outgoing Line)) Local/10042@phones-3 10042@phones:2 Up Dial(Local/10041@phones) Local/10035@phones-a (None) Up AppDial((Outgoing Line)) Local/10035@phones-a 10035@phones:2 Up Dial(Local/10034@phones) Local/10036@phones-a (None) Up AppDial((Outgoing Line)) Local/10030@phones-3 (None) Up AppDial((Outgoing Line)) Local/10036@phones-a 10036@phones:2 Up Dial(Local/10035@phones) Local/10030@phones-3 10030@phones:2 Up Dial(Local/10029@phones) Local/10044@phones-e 10044@phones:2 Up Dial(Local/10043@phones) Local/10044@phones-e (None) Up AppDial((Outgoing Line)) Local/10021@phones-6 10021@phones:2 Up Dial(Local/10020@phones) Local/10021@phones-6 (None) Up AppDial((Outgoing Line)) Local/10024@phones-9 10024@phones:2 Up Dial(Local/10023@phones) Local/10024@phones-9 (None) Up AppDial((Outgoing Line)) Local/10049@phones-9 10049@phones:2 Up Dial(Local/10048@phones) Local/10049@phones-9 (None) Up AppDial((Outgoing Line)) Local/10039@phones-1 10039@phones:2 Up Dial(Local/10038@phones) Local/10039@phones-1 (None) Up AppDial((Outgoing Line)) Local/10038@phones-e (None) Up AppDial((Outgoing Line)) Local/10038@phones-e 10038@phones:2 Up Dial(Local/10037@phones) Local/10031@phones-e 10031@phones:2 Up Dial(Local/10030@phones) Local/10031@phones-e (None) Up AppDial((Outgoing Line)) Local/10014@phones-2 (None) Up AppDial((Outgoing Line)) Local/10014@phones-2 10014@phones:2 Up Dial(Local/10013@phones) Local/10048@phones-f 10048@phones:2 Up Dial(Local/10047@phones) Local/10048@phones-f (None) Up AppDial((Outgoing Line)) Local/10043@phones-3 10043@phones:2 Up Dial(Local/10042@phones) Local/10043@phones-3 (None) Up AppDial((Outgoing Line)) Local/10034@phones-7 (None) Up AppDial((Outgoing Line)) Local/10034@phones-7 10034@phones:2 Up Dial(Local/10033@phones) Local/10041@phones-2 10041@phones:2 Up Dial(Local/10040@phones) Local/10041@phones-2 (None) Up AppDial((Outgoing Line)) 95 active channels 48 active calls 56 calls processed asterix*CLI> core show locks ======================================================================= === Currently Held Locks ============================================== ======================================================================= === === <pending> <lock#> (<file>): <lock type> <line num> <function> <lock name> <lock addr> (times locked) === ======================================================================= asterix*CLI> Then, hang up, they all clear down. == Spawn extension (phones, 10050, 2) exited non-zero on 'SIP/bt100black-00000001' == Spawn extension (phones, 10049, 2) exited non-zero on 'Local/10049@phones-97ad;2' == Spawn extension (phones, 10047, 2) exited non-zero on 'Local/10047@phones-c5e1;2' == Spawn extension (phones, 10046, 2) exited non-zero on 'Local/10046@phones-e980;2' == Spawn extension (phones, 10045, 2) exited non-zero on 'Local/10045@phones-4580;2' == Spawn extension (phones, 10044, 2) exited non-zero on 'Local/10044@phones-e526;2' == Spawn extension (phones, 10043, 2) exited non-zero on 'Local/10043@phones-3d9a;2' == Spawn extension (phones, 10048, 2) exited non-zero on 'Local/10048@phones-f59b;2' == Spawn extension (phones, 10042, 2) exited non-zero on 'Local/10042@phones-30c5;2' == Spawn extension (phones, 10041, 2) exited non-zero on 'Local/10041@phones-2fe9;2' == Spawn extension (phones, 10040, 2) exited non-zero on 'Local/10040@phones-4528;2' == Spawn extension (phones, 10037, 2) exited non-zero on 'Local/10037@phones-4d65;2' == Spawn extension (phones, 10036, 2) exited non-zero on 'Local/10036@phones-a3d2;2' == Spawn extension (phones, 10035, 2) exited non-zero on 'Local/10035@phones-adc6;2' == Spawn extension (phones, 10038, 2) exited non-zero on 'Local/10038@phones-e3c2;2' == Spawn extension (phones, 10034, 2) exited non-zero on 'Local/10034@phones-70fc;2' == Spawn extension (phones, 10033, 2) exited non-zero on 'Local/10033@phones-81ef;2' == Spawn extension (phones, 10032, 2) exited non-zero on 'Local/10032@phones-1854;2' == Spawn extension (phones, 10031, 2) exited non-zero on 'Local/10031@phones-e84d;2' == Spawn extension (phones, 10030, 2) exited non-zero on 'Local/10030@phones-3107;2' == Spawn extension (phones, 10029, 2) exited non-zero on 'Local/10029@phones-4301;2' == Spawn extension (phones, 10028, 2) exited non-zero on 'Local/10028@phones-2b78;2' == Spawn extension (phones, 10027, 2) exited non-zero on 'Local/10027@phones-2bb8;2' == Spawn extension (phones, 10026, 2) exited non-zero on 'Local/10026@phones-909b;2' == Spawn extension (phones, 10025, 2) exited non-zero on 'Local/10025@phones-baf9;2' == Spawn extension (phones, 10023, 2) exited non-zero on 'Local/10023@phones-80f8;2' == Spawn extension (phones, 10021, 2) exited non-zero on 'Local/10021@phones-6344;2' == Spawn extension (phones, 10022, 2) exited non-zero on 'Local/10022@phones-6f2b;2' == Spawn extension (phones, 10020, 2) exited non-zero on 'Local/10020@phones-33e8;2' == Spawn extension (phones, 10019, 2) exited non-zero on 'Local/10019@phones-180a;2' == Spawn extension (phones, 10018, 2) exited non-zero on 'Local/10018@phones-a6c5;2' == Spawn extension (phones, 10024, 2) exited non-zero on 'Local/10024@phones-9111;2' == Spawn extension (phones, 10017, 2) exited non-zero on 'Local/10017@phones-72fb;2' == Spawn extension (phones, 10016, 2) exited non-zero on 'Local/10016@phones-1f65;2' == Spawn extension (phones, 10039, 2) exited non-zero on 'Local/10039@phones-1985;2' == Spawn extension (phones, 10014, 2) exited non-zero on 'Local/10014@phones-2566;2' == Spawn extension (phones, 10015, 2) exited non-zero on 'Local/10015@phones-e4d8;2' == Spawn extension (phones, 10013, 2) exited non-zero on 'Local/10013@phones-7335;2' == Spawn extension (phones, 10012, 2) exited non-zero on 'Local/10012@phones-aee5;2' == Spawn extension (phones, 10011, 2) exited non-zero on 'Local/10011@phones-89fa;2' == Spawn extension (phones, 10010, 2) exited non-zero on 'Local/10010@phones-6ef1;2' == Spawn extension (phones, 10009, 2) exited non-zero on 'Local/10009@phones-1488;2' == Spawn extension (phones, 10008, 2) exited non-zero on 'Local/10008@phones-41ee;2' == Spawn extension (phones, 10007, 2) exited non-zero on 'Local/10007@phones-f3c8;2' == Spawn extension (phones, 10006, 2) exited non-zero on 'Local/10006@phones-0a93;2' == Spawn extension (phones, 10005, 2) exited non-zero on 'Local/10005@phones-0ae1;2' == Spawn extension (echo-test, s, 4) exited non-zero on 'Local/10000@phones-056b;2' == Spawn extension (phones, 10001, 2) exited non-zero on 'Local/10001@phones-82b6;2' asterix*CLI> asterix*CLI> core show channels Channel Location State Application(Data) 0 active channels 0 active calls 56 calls processed asterix*CLI> Note: 10002, 10003, 10004 had been sucessfully optimized out. By: Alec Davis (alecdavis) 2010-09-24 22:04:53 enabling '/n' on the dial command, leaves all channels open as it should, but still get exceptionally long queue. By: John Covert (jcovert) 2011-02-16 17:16:15.000-0600 It doesn't take 30 local channels, it can happen with just one active. The following is on 1.8.1.1 I just had an occurrence of this (message spewed 58,057 times over a three minute period) at a client who is getting more and more frustrated with the state of things: [root@asterisk asterisk]# grep queue /tmp/x.tmp |wc 58057 638627 7024897 [root@asterisk asterisk]# grep queue /tmp/x.tmp [14-Feb-2011 13:16:36.75] WARNING[9558] channel.c: Exceptionally long queue length queuing to Local/sleep@monitor-be7c;2 [14-Feb-2011 13:16:36.75] WARNING[9558] channel.c: Exceptionally long queue length queuing to Local/sleep@monitor-be7c;2 ... [14-Feb-2011 13:19:30.78] WARNING[9559] channel.c: Exceptionally long queue length queuing to Local/sleep@monitor-be7c;2 [14-Feb-2011 13:19:31.37] WARNING[9559] channel.c: Exceptionally long queue length queuing to Local/sleep@monitor-be7c;2 While this was going on a dozen or so unrelated SIP calls got fouled up, and we just noticed they were still hanging around two days later, because a client called us and complained about missing recordings (which would have been queued when the Mixmonitor command completed) Channel Context Extension Prio State Application Data CallerID Duration Accountcode PeerAccount BridgedTo SIP/onstate-sip-serv macro-anyoldcall s 10 Up Dial SIP/+16239357327@onstate- +15132409191 51:33:49 SIP/onstate-sip-serv SIP/onstate-sip-serv macro-anyoldcall s 10 Up Dial SIP/+16232437242@onstate- +16102524416 51:44:38 SIP/onstate-sip-serv and these others: SIP/asterisk4-00017e callskype gmagent1 1 Up AppDial (Outgoing Line) gmagent1 51:49:01 SIP/onstate-sip-serv SIP/asterisk4-00017e callskype klauss-ia 1 Up AppDial (Outgoing Line) klauss-ia 51:48:41 SIP/onstate-sip-serv SIP/onstate-sip-serv callskype klauss-ia 31 Up Dial SIP/8759klauss-ia@asteris +12145457068 51:48:41 SIP/asterisk4-00017e SIP/brendar.moving.s onstate 8717brendar.movi 1 Up AppDial (Outgoing Line) 8717brendar.mov 51:53:47 SIP/onstate-sip-serv SIP/asterisk4-00017e callskype aahcs.ladyclaire 1 Up AppDial (Outgoing Line) aahcs.ladyclair 51:50:19 SIP/onstate-sip-serv SIP/onstate-sip-serv callskype gmagent1 31 Up Dial SIP/8759gmagent1@asterisk +14104300932 51:49:01 SIP/asterisk4-00017e SIP/onstate-sip-serv onstate 8717brendar.movi 13 Up Dial SIP/brendar.moving.specia +12487988748 51:53:47 SIP/brendar.moving.s SIP/jenilynn-00017d8 onstate 8717jenilynn 1 Up AppDial (Outgoing Line) 8717jenilynn 51:54:38 SIP/onstate-sip-serv SIP/johns.moving.spe onstate 8717johns.moving 1 Up AppDial (Outgoing Line) 8717johns.movin 51:47:54 SIP/onstate-sip-serv SIP/onstate-sip-serv macro-anyoldcall s 1 Up AppDial (Outgoing Line) +16128861133 51:54:23 SIP/onstate-sip-serv SIP/onstate-sip-serv onstate 8717topdog.james 13 Up Dial SIP/topdog.james.etchegar +12392623643 51:54:27 SIP/topdog.james.etc SIP/onstate-sip-serv onstate 8717jenilynn 13 Up Dial SIP/jenilynn,120,M(onansw +14058485000 51:54:38 SIP/jenilynn-00017d8 SIP/onstate-sip-serv onstate 8717bcrann.movin 13 Up Dial SIP/bcrann.moving.special +14044318017 51:48:13 SIP/bcrann.moving.sp SIP/onstate-sip-serv callskype aahcs.rowena.raq 27 Up Dial SIP/8759aahcs.rowena.raqu +13096696012 51:49:57 SIP/asterisk2-00017e SIP/onstate-sip-serv callskype osa.ivyjoy.villa 27 Up Dial SIP/8759osa.ivyjoy.villan +14434800725 51:50:42 SIP/asterisk2-00017e SIP/asterisk2-00017e callskype aahcs.christine. 1 Up AppDial (Outgoing Line) aahcs.christine 51:51:24 SIP/onstate-sip-serv SIP/onstate-sip-serv callskype np.agent3 27 Up Dial SIP/8759np.agent3@asteris +14085957148 51:50:51 SIP/asterisk2-00017e SIP/onstate-sip-serv callskype aahcs.ladyclaire 31 Up Dial SIP/8759aahcs.ladyclaire. +18707465111 51:50:20 SIP/asterisk4-00017e SIP/asterisk2-00017e callskype np.agent3 1 Up AppDial (Outgoing Line) np.agent3 51:50:51 SIP/onstate-sip-serv SIP/topdog.james.etc onstate 8717topdog.james 1 Up AppDial (Outgoing Line) 8717topdog.jame 51:54:27 SIP/onstate-sip-serv SIP/asterisk2-00017e callskype osa.ivyjoy.villa 1 Up AppDial (Outgoing Line) osa.ivyjoy.vill 51:50:42 SIP/onstate-sip-serv SIP/onstate-sip-serv callskype aahcs.christine. 27 Up Dial SIP/8759aahcs.christine.c +15802239447 51:51:24 SIP/asterisk2-00017e SIP/asterisk2-00017e callskype aahcs.rowena.raq 1 Up AppDial (Outgoing Line) aahcs.rowena.ra 51:49:56 SIP/onstate-sip-serv SIP/bcrann.moving.sp onstate 8717bcrann.movin 1 Up AppDial (Outgoing Line) 8717bcrann.movi 51:48:13 SIP/onstate-sip-serv SIP/hollye.moving.sp onstate 8717hollye.movin 1 Up AppDial (Outgoing Line) 8717hollye.movi 51:48:11 SIP/onstate-sip-serv SIP/onstate-sip-serv macro-anyoldcall s 10 Up Dial SIP/+16128861133@onstate- +17154565448 51:54:23 SIP/onstate-sip-serv SIP/onstate-sip-serv onstate 8717johns.moving 13 Up Dial SIP/johns.moving.speciali +14044318017 51:47:54 SIP/johns.moving.spe SIP/onstate-sip-serv onstate 8717hollye.movin 13 Up Dial SIP/hollye.moving.special +18054534687 51:48:11 SIP/hollye.moving.sp Here's one of the two day old calls: [root@asterisk asterisk]# asterisk -rx 'core show channel SIP/onstate-sip-server-00017e79' -- General -- Name: SIP/onstate-sip-server-00017e79 Type: SIP UniqueID: asterisk.onstate.net-1297707348.133129 LinkedID: asterisk.onstate.net-1297707348.133129 Caller ID: +12145457068 Caller ID Name: (N/A) DNID Digits: 8759klauss-ia Language: en State: Up (6) Rings: 0 NativeFormats: 0x80004 (ulaw|h263) WriteFormat: 0x4 (ulaw) ReadFormat: 0x4 (ulaw) WriteTranscode: No ReadTranscode: No 1st File Descriptor: 583 Frames in: 6107 Frames out: 4 Time to Hangup: 0 Elapsed Time: 52h58m40s Direct Bridge: SIP/asterisk4-00017e7a Indirect Bridge: SIP/asterisk4-00017e7a -- PBX -- Context: callskype Extension: klauss-ia Priority: 31 Call Group: 0 Pickup Group: 0 Application: Dial Data: SIP/8759klauss-ia@asterisk4,120,M(onanswer) Blocking in: ast_waitfor_nandfds Variables: BRIDGEPVTCALLID=4cc2bf9b63318baf0f1ba4874ed94005@63.210.44.146:5060 BRIDGEPEER=SIP/asterisk4-00017e7a DIALEDPEERNUMBER=8759klauss-ia@asterisk4 DIALEDPEERNAME=SIP/asterisk4-00017e7a DIALSTATUS=ANSWER ~HASH~SIP_CAUSE~SIP/asterisk4-00017e7a~=SIP 200 OK DIALEDTIME= ANSWEREDTIME= retry=0 MACRO_DEPTH=0 which=server2 server2now=44 server1now=66 server2=asterisk4 server2max=40 server1=asterisk2 server1max=65 DB_RESULT=241 SkypeCount=111 GOSUB_RETVAL= numcalls=91 prefix= db_tracker=recording/test/test SIPCALLID=2691d25b@sipserver1.On-State.com SIPDOMAIN=gw.onstate.net SIPURI=sip:SipServerClient@63.210.44.142:5061 CDR Variables: level 1: dnid=8759klauss-ia level 1: clid=+12145457068 level 1: src=+12145457068 level 1: dst=klauss-ia level 1: dcontext=callskype level 1: channel=SIP/onstate-sip-server-00017e79 level 1: dstchannel=SIP/asterisk4-00017e7a level 1: lastapp=Dial level 1: lastdata=SIP/8759klauss-ia@asterisk4,120,M(onanswer) level 1: start=2011-02-14 13:15:48 level 1: answer=2011-02-14 13:15:54 level 1: duration=190720 level 1: billsec=190714 level 1: disposition=ANSWERED level 1: amaflags=DOCUMENTATION level 1: uniqueid=asterisk.onstate.net-1297707348.133129 level 1: linkedid=asterisk.onstate.net-1297707348 level 1: sequence=183520 [root@asterisk asterisk]# asterisk -rx 'core show channel SIP/asterisk4-00017e7a' -- General -- Name: SIP/asterisk4-00017e7a Type: SIP UniqueID: asterisk.onstate.net-1297707348.133130 LinkedID: asterisk.onstate.net-1297707348.133129 Caller ID: klauss-ia Caller ID Name: (N/A) DNID Digits: (N/A) Language: en State: Up (6) Rings: 0 NativeFormats: 0x80004 (ulaw|h263) WriteFormat: 0x4 (ulaw) ReadFormat: 0x4 (ulaw) WriteTranscode: No ReadTranscode: No 1st File Descriptor: 598 Frames in: 6170 Frames out: 0 Time to Hangup: 0 Elapsed Time: 53h0m6s Direct Bridge: SIP/onstate-sip-server-00017e79 Indirect Bridge: SIP/onstate-sip-server-00017e79 -- PBX -- Context: callskype Extension: klauss-ia Priority: 1 Call Group: 0 Pickup Group: 0 Application: AppDial Data: (Outgoing Line) Blocking in: ast_waitfor_nandfds Variables: BRIDGEPVTCALLID=2691d25b@sipserver1.On-State.com BRIDGEPEER=SIP/onstate-sip-server-00017e79 MACRO_DEPTH=0 DIALEDPEERNUMBER=8759klauss-ia@asterisk4 SIPCALLID=4cc2bf9b63318baf0f1ba4874ed94005@63.210.44.146:5060 CDR Variables: level 1: dnid= level 1: clid=klauss-ia level 1: src=klauss-ia level 1: dst=s level 1: dcontext=onstate level 1: channel=SIP/asterisk4-00017e7a level 1: lastapp=MacroExit level 1: start=2011-02-14 13:15:48 level 1: answer=2011-02-14 13:15:54 level 1: duration=190805 level 1: billsec=190799 level 1: disposition=ANSWERED level 1: amaflags=DOCUMENTATION level 1: uniqueid=asterisk.onstate.net-1297707348.133130 level 1: linkedid=asterisk.onstate.net-1297707348 level 1: sequence=183514 By: Rafael Prado Rocchi (prado) 2011-04-12 19:49:33 Alecdavis, thank you a lot for that dialplan! It was essencial for me to find the problem on the asterisk code. Jcovert, I made a patch for your version (1.8.1.1). This problem also affected 1.4 as well, patch solved too. (did not test on 1.6) Now I'm gonna test it in trunk and upload it in a few minutes. By: Rafael Prado Rocchi (prado) 2011-04-12 20:08:17 3 patches uploaded (Trunk, 1.8.1.1 and 1.4) It solves the problems reported here, but I would like some feedback from other developers about it. Prado By: Rafael Prado Rocchi (prado) 2011-04-12 20:28:29 Some information: Searching SVN code now, I found this problem was created by issue 14249, more than 2 years ago. It's a suspended issue for lack of feedback but the code was put in asterisk. It first appeared in asterisk 1.4.24 and is present in all versions from that until 1.8 and trunk. https://issues.asterisk.org/view.php?id=14249 I'll do some testing about that old issue 14249..... feedback later. Prado By: John Covert (jcovert) 2011-04-12 20:46:04 I'm on 1.8.3.2 now. I tried your patch. No joy. See issue ASTERISK-16258 (note that the following happens if I go back to the default helperthreads) asterisk*CLI> originate sip/x33 application dial IAX2/guest@iax.covert.org/ringnoanswer -- Launching dial(IAX2/guest@iax.covert.org/ringnoanswer) on SIP/x33-00000004 -- Called guest@iax.covert.org/ringnoanswer -- Accepting UNAUTHENTICATED call from 192.168.0.100: > requested format = ulaw, > requested prefs = (ulaw|gsm), > actual format = ulaw, > host prefs = (ulaw|gsm), > priority = caller -- Call accepted by 192.168.0.100 (format ulaw) -- Format for call is ulaw -- Executing [ringnoanswer@sip:1] Progress("IAX2/192.168.0.100:4569-519", "") in new stack -- Executing [ringnoanswer@sip:2] NoOp("IAX2/192.168.0.100:4569-519", ""Cisco 33" <33>") in new stack -- Executing [ringnoanswer@sip:3] PlayTones("IAX2/192.168.0.100:4569-519", "440+495/2000,0/4000") in new stack -- Executing [ringnoanswer@sip:4] Wait("IAX2/192.168.0.100:4569-519", "120") in new stack -- IAX2/192.168.0.100:4569-2748 is making progress passing it to SIP/x33-00000004 [Apr 12 21:43:02] WARNING[26350]: channel.c:1426 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/192.168.0.100:4569-2748 [Apr 12 21:43:07] WARNING[26350]: channel.c:1426 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/192.168.0.100:4569-2748 [Apr 12 21:43:12] WARNING[26350]: channel.c:1426 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/192.168.0.100:4569-2748 -- Hungup 'IAX2/192.168.0.100:4569-2748' [Apr 12 21:43:29] WARNING[26350]: channel.c:1426 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/192.168.0.100:4569-519 [Apr 12 21:43:29] WARNING[26350]: channel.c:1426 __ast_queue_frame: Exceptionally long voice queue length queuing to IAX2/192.168.0.100:4569-519 == Spawn extension (sip, ringnoanswer, 4) exited non-zero on 'IAX2/192.168.0.100:4569-519' -- Hungup 'IAX2/192.168.0.100:4569-519' asterisk*CLI> By: Rafael Prado Rocchi (prado) 2011-04-13 11:32:27 All my testes were done using alecdavis dialplan. It worked. I'll try to reproduce yours, using IAX. By: John Covert (jcovert) 2011-04-19 12:20:39 So prado, what was the theory behind the patch when it was applied, and what is your theory about why it causes this problem, and finally, what about the original problem this was supposed to fix? It seems that if simply removing "ast_indicate(chan, -1);" from the routine really does fix the problem, the right fix should be to determine what ast_indicate is doing that's so bad and fix it there. /john By: Leif Madsen (lmadsen) 2011-05-04 14:32:12 Have a client also experiencing this. I have implemented the patch onto a 1.6.2.14 system and am continuing to monitor it. We'll see what happens. It does feel like this isn't the solution, but rather a work around right now. It'd be nice to know why this happens. By: Leif Madsen (lmadsen) 2011-05-09 13:53:43 I'm dropping this status to Confirmed as there is a patch, but I don't think this is going to be the right solution. By: John michelle (jhmichelle) 2011-05-22 23:46:58 I can confirm that my asterisk box have the same problem, i have tested the above patch still no joy.one more note this warning seems to appear with peers that are Transatlantic i.e longer delay. so maybe we need more tolerance for longer delays. By: David Vossel (dvossel) 2011-05-25 15:14:33 The exceptional queue length is probably due to having debug threads enabled. Debug threads is crazy expensive, especially when dealing with local_channels. By: Alec Davis (alecdavis) 2011-05-26 06:11:57 Turning off 'DONT OPTIMIZE' and 'DEBUG THREADS', and logger.conf disable full, and core set verbose = 0 and core set debug = 0. Still gives, "Exceptionally long queue length queuing to Local/10036@phones-f6c1;2" But applying reviewboard https://reviewboard.asterisk.org/r/1231/ atleast no more deadlocks. By: John michelle (jhmichelle) 2011-05-26 12:39:08 This error happens when i am using IAX2 , so i hope there would be a similar patch for the deadlocks for CHAN_IAX like the one alecdavis referred to. By: David Vossel (dvossel) 2011-05-26 16:11:00 I thought this issue was isolated to the use of local channels. What iax2 deadlock are you referring to? By: John michelle (jhmichelle) 2011-05-26 16:51:30 when this problem occurs in iax it happens when calls are transatlantic so network delay is what causes the frames to be queued for more than 96 which causes the frame head to be removed which leads to voice degradation. when that happen while channel.c is still printing the error. the callee tries to hangup And acquire a lock. this is where the deadlock happens between ast_queue_frame and ast_write. network delay is a major factor and i don't know how to reproduce that. By: John michelle (jhmichelle) 2011-05-26 16:56:29 dvossel : please check jcovert post he also mentions iax2 problem By: Digium Subversion (svnbot) 2011-05-31 13:52:58 Repository: asterisk Revision: 321515 U branches/1.8/channels/chan_local.c ------------------------------------------------------------------------ r321515 | dvossel | 2011-05-31 13:52:57 -0500 (Tue, 31 May 2011) | 12 lines Chan_local locking cleanup. This patch removes all of the unnecessary deadlock avoidance loops that occur in chan_local. It also resolves an issue with a deadlock triggered by local channel optimizations. (issue ASTERISK-16711) Review: https://reviewboard.asterisk.org/r/1231/ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=321515 By: Digium Subversion (svnbot) 2011-05-31 14:01:46 Repository: asterisk Revision: 321516 _U trunk/ U trunk/channels/chan_local.c ------------------------------------------------------------------------ r321516 | dvossel | 2011-05-31 14:01:45 -0500 (Tue, 31 May 2011) | 18 lines Merged revisions 321515 via svnmerge from https://origsvn.digium.com/svn/asterisk/branches/1.8 ........ r321515 | dvossel | 2011-05-31 13:52:54 -0500 (Tue, 31 May 2011) | 12 lines Chan_local locking cleanup. This patch removes all of the unnecessary deadlock avoidance loops that occur in chan_local. It also resolves an issue with a deadlock triggered by local channel optimizations. (issue ASTERISK-16711) Review: https://reviewboard.asterisk.org/r/1231/ ........ ------------------------------------------------------------------------ http://svn.digium.com/view/asterisk?view=rev&revision=321516 By: John michelle (jhmichelle) 2011-06-03 09:18:50 one more note , i am on asterisk 1.6.2.18 now, when asterisk starts spewing Exceptionally long voice queue length queuing to IAX2/XXX , it also show the warning MAX retries exceeded to host IAX/XXX and it doesn't stop there , the channel Doesn't get hangup and 'MAX retries...' warning keeps showing up every 30 to 40 secs for every channel that didn't hangup. By: Vladimir Mikhelson (vmikhelson) 2011-06-05 01:25:17 Tested the patch as a possible remedy for issue 18800. Asterisk seg faulted 7 times upon restart. After the patch reversal everything is back to normal. Core dump analysis is attached to issue 18800. I can re-attach here if needed. -Vladimir By: ppower (ppower) 2011-06-24 09:31:17.075-0500 This issue caught my eye because the 'Exceptionally long voice queue length' message was plaguing me with regards to MOH and parking (mantis 0018262). My understanding of this problem is that the timer events needed to process the queued frames are not getting noticed. In my experience, with IAX if you get 'Exceptionally long voice queue length' you will eventually get 'MAX retries exceeded to host' messages. By: Leif Madsen (lmadsen) 2011-09-22 07:39:53.884-0500 Are people still running into this on 1.8.7.0-rc1 or later? By: JoshE (n8ideas) 2011-12-27 16:49:02.772-0600 I seem to have just run into this issue on 1.8.7.0. Not sure of exact circumstances, but this was a Local/ issue not an IAX channel problem, so far as I can tell. Don't have better debugging on this now, but will keep an eye on this. By: Alex (alexrixhardson) 2012-01-25 07:58:14.287-0600 Yes, I have also run into this issue on 1.8.7.0. The warnings in the messages log file were "Exceptionally long queue length queueing to Local/XXXX". There was no IAX2 active channel. Only SIP, DAHDI and Local channels were active at the time. There was about 10-20 messages per seconds and it was impossible to make any additional call which would include Local channel at that point. Restart of Asterisk was required. No other warnings were logged prior this incident. By: Alex (alexrixhardson) 2012-01-30 11:08:10.737-0600 I managed to capture two "core show locks" outputs from Asterisk once it started spitting the "Exceptionally long queue length queueing to Local" messages. Please, see the putty1.log and putty2.log files. *Edit:* This was captured on 1.8.9.0 (I did an upgrade on Friday, but unfortunately it didn't solve this problem) By: Alex (alexrixhardson) 2012-01-31 05:49:35.356-0600 I have a few production servers using 1.8.X Asterisk with several hundred concurrent calls going on almost 24/7. This error crashes at least one of the servers per week. Is it possible to raise the severity of this problem from "Minor" to "Major"? By: Ole Kaas (ole.kaas) 2012-06-06 08:17:20.087-0500 FWIW I've just seen several of these on 1.8.13.0. I was inspecting log after restarting asterisk because it didn't repond to SIP requests. When I get an alert, I log onto asterisk CLI and issus an "core show channels" to verify the alert. When I get an output like the one below I kill asterisk with a -9 and start it again with the init script. It could be unrelated, but the below sample log entry appeared just before asterisk was killed. It could also be that because SIP was dead, the below log entry appeared because the SIP phone with that extension didn't respond. {noformat} Connected to Asterisk 1.8.13.0 currently running on server (pid = 25677) asterisk*CLI> core show channels Channel Location State Application(Data) asterisk*CLI> exit {noformat} {noformat} asterisk[25677]: WARNING[8788]: channel.c:1474 in __ast_queue_frame: Exceptionally long voice queue length queuing to Local/ext@some-context {noformat} By: Richard Mudgett (rmudgett) 2012-06-06 09:58:32.034-0500 The "Exceptionally long queue length queuing to XXXXX" message can be caused by many things. This issue is specific to local channels optimizing themselves out of the call. If you are seeing this message, you should create a new issue and supply the necessary debug information to determine the cause of the problem. |