[Home]

Summary:ASTERISK-09870: my asterisk comes down in flames randomly, it appears to be related to chanspy
Reporter:Rene Mendoza (renemendoza)Labels:
Date Opened:2007-07-12 16:24:44Date Closed:2011-06-07 14:01:02
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Applications/app_chanspy
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast_backtrace_1.txt
( 1) backtrace.txt
( 2) backtrace2.txt
( 3) backtrace3.txt
( 4) backtrace4.txt
( 5) backtrace4-a.txt
( 6) bt-bt_full-thread_apply_all_bt_2.txt
( 7) bt-bt_full-thread_apply_all_bt.txt
Description:i have almost as much chan_spy calls as zap->sip calls i am doing no transcoding

Asterisk 1.4.7
Zaptel 1.4.3
PowerEdge 2950 2 3Ghz dual core Xeon processors
8 Gigabits RAM


i have around 14 calls and 6 out of them might be chan_spy calls
doing very light recording
using queues and local channels

asterisk crashes randomly
sometimes stays up for days sometimes it crashes more than once a day

****** ADDITIONAL INFORMATION ******

backtrace posted
Comments:By: Mark Michelson (mmichelson) 2007-07-13 10:24:22

Seems that spyee->name is null. Can you provide a bt full? This would help to confirm it.

By: Rene Mendoza (renemendoza) 2007-07-13 11:20:39

waiting for a crash to happen  i have upgraded to 1.4.7.1 and was able to compile with dont-optimize so i expect to post a new bt and bt full shortly

thanks

By: Joshua C. Colp (jcolp) 2007-07-16 07:40:31

It would also be nifty to know how Chanspy is used and what options.

By: Rene Mendoza (renemendoza) 2007-07-16 12:10:14

i am using it like this

exten => _*10XXXX,1,ChanSpy(SIP/${EXTEN:3}|b)

phones that are being spied on and phones that perform the spying both use the same codec... and so does the pstn line... damn asterisk i am waiting for it to crash, it only does it when nobody is looking. I SWEAR IT!!! j/k

By: Rene Mendoza (renemendoza) 2007-07-16 12:33:35

this is a fresh crash

By: Mark Michelson (mmichelson) 2007-07-16 16:01:48

Have you noticed any common circumstances surrounding the crashes? For instance, does it always happen at the beginning of a call or when a call hangs up? Does the phone doing the spying always call into a bridged call or does the phone doing the spying sometimes call Chanspy before a call is established between the phones being spied on? Are there any other circumstances that you notice always happen when the crash occurs?

The next time this happens, could you post debug output from the CLI both from a time when the crash does not occur and from a time when the crash does occur? It looks like there may be some sort of race condition where the channel is freed at just the exact moment so that chanspy causes a crash and CLI debug output could be helpful in tracking this.

By: Rene Mendoza (renemendoza) 2007-07-16 16:34:04

sometimes, the spied channel is gone, and the spying channel keeps beeping for long. that happens all the time. sometimes, the spying device will hangup but asterisk will think it is still up and it stays in a zombie like state

By: Mark Michelson (mmichelson) 2007-07-17 10:25:20

When I was browsing the bugs, I saw issue 10209 and I wonder if it might be causing your crash. There is a call to ast_check_hangup right before the line in which your crash occurs, and if it's returning the wrong value, then I can definitely see it causing the crash you're seeing. I'm going to relate this issue to that one. When the patch for 10209 is committed, I'll make note of it here and you can see if it fixes your problem.

By: Perssy Llamosas (pll) 2007-07-26 16:29:34

This bug has been a while and since ast_check_hangup bug has been fixed in 1.4.9, I will poke it.

This is probably chanspy's bug since on the same server with chanspy disabled my Asterisk hasn't crashed once, if I enable chanspy Asterisk will randomly crash, it can be running for several days without problems or it will crash several times a day.

I have experienced the problem in Asterisk 1.4.5 1.4.7 1.4.7.1 1.4.8 and 1.4.9 on my HP Proliant ML110 G3 which is a 3.0GHz Dual Core processor.

It's curious since the same versions of Asterisk in my backup server which is a PIII 1.0GHz with the same load and the same numbers of spies won't crash, I have waited for it to crash for several days but it just won't.

I am adding a fresh "bt", "bt full" and "thread apply all bt" from my Asterisk 1.4.9 on my ML110 G3, I don't seem to be able to get a crash from my backup server.

By: Rene Mendoza (renemendoza) 2007-08-02 17:10:07

I have added backtrace3.txt created using asterisk 1.4.8, and i am seeing the same issues as reported by PLL one week with no crashes then three crashes in one day. luckily safe_asterisk is saving face for us. is it possible that there is something wrong with chan_spy?

By: Rene Mendoza (renemendoza) 2007-08-06 14:49:13

uploaded new core dump, for new crash, again there is a part that says something about a linked list corruption, any help will be much appreciated

By: Rene Mendoza (renemendoza) 2007-08-06 14:55:12

sorry first upload was trunked

core dump generated by asterisk 1.4.8
will upgrade today to the latest asterisk version and see if that helps

By: Perssy Llamosas (pll) 2007-08-20 16:03:27

Uploaded bt, bt_full, thread apply all bt from Asterisk 1.4.10.1
This time switching the chanspy on provided to be catastropic. Asterisk crashed everytime somebody spied, I only tested it for half an hour, too many crashes.

By: Perssy Llamosas (pll) 2007-08-23 18:18:09

I have been reading other bugs related to spy channels, specifically mixmonitor, and I have found that spy channels + jitter buffer enabled = random crash.

I have disabled my jitter buffer in my production server and found that my backup server had it disabled. My production server hasn't crashed once with chanspy enabled and jitter buffer disabled but maybe it's too soon to come to conclusions.

Renemendoza, does your server have jitter buffer enabled? If so, try disabling it and test the chanspy.

I am testing it in Asterisk 1.4.11, with jitter buffer enabled it crashes too.



By: Rene Mendoza (renemendoza) 2007-08-26 15:42:49

PLL, i didnt have any specific jb setting in sip.conf, i am adding jbenable=no to sip.conf and see what happens

By: Perssy Llamosas (pll) 2007-08-27 10:01:40

It has been 4 days and it hasn't crashed once.

I had jbenable=yes in zapata.conf I don't know if there is a working jitter buffer for sip channel yet. As far as I know there there is no jitterbuffer in the RTP-based channels.



By: Rene Mendoza (renemendoza) 2007-08-27 20:19:28

ok i am adding that too, today i crashed, no core dump, i say to aim for a two week uptime would be cool

By: Fabiano Heringer (fabianoheringer) 2007-08-31 23:58:30

Hi, I´m having the same crash on 1.4.9 version... (I tried 1.4.10.1 and 1.4.11 too)
It´s look like when spying on Zap <-> Sip Channels after channel hungup.

Below is the of the bt full of my coredump generated after crash:

http://www.pastebin.org/1534

I put in this site for a couple of lines in this dump...



By: Rene Mendoza (renemendoza) 2007-09-03 11:12:17

i crashed again, same stuff

By: Perssy Llamosas (pll) 2007-09-03 18:44:27

Still no crash for me.



By: Adam Kavan (akavan) 2007-09-06 10:17:11

I am running asterisk 1.4.11 and this is still happening.  spyee->name is not null but it is a pointer to something that is out of bounds.  I am not running jitterbuffers anywhere.  What information would you like?  I figured I would ask since we already have lots of backtraces on this issue.

By: Perssy Llamosas (pll) 2007-09-11 10:06:26

I confirm that "jbenable=no" in zapata.conf fixed the problem in my configuration. No crash in 15 days and the server used to crash at least once in the day when anybody spied.

By: Fabiano Heringer (fabianoheringer) 2007-09-12 09:54:04

I tried put the option jbenable=no, but I got crash again...
Running gdb on coredump, the crash occurs exactly on chanspy...

By: Rene Mendoza (renemendoza) 2007-09-18 18:55:24

i want to add a note that this same problem happens to Asterisk BE and digium still wont provide a fix for that

By: Rene Mendoza (renemendoza) 2007-09-18 18:57:26

PLL is your asterisk still up? you should be at 20 something days uptime now

By: Perssy Llamosas (pll) 2007-09-19 11:31:17

Still up.

By: Mathieu BOYER (thieums) 2007-09-19 18:12:22

I've got the same problem with 1.4.11, random crashes when using chanspy. I can provide backtraces.

By: Adam Kavan (akavan) 2007-09-25 08:46:58

I can confirm that jbenable=no does not fix this problem, I got 3 more crashes last night.

Maybe its my configuration that is different.

The channels being spyed on are Sip -> ZAP calls.  The Spy's are coming in over IAX lines.  

Anyone else have a similar config?



By: Fabiano Heringer (fabianoheringer) 2007-09-30 15:26:28

I still gettin crash, below the part of coredump

Core was generated by `/usr/sbin/asterisk -f -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
#0  0x080801d5 in ast_channel_spy_remove (chan=0x8457640, spy=0xb6320760) at channel.c:1488
1488            AST_LIST_REMOVE(&chan->spies->list, spy, list);


It´s seem to be in ast_channel_spy_remove.....

By: Joshua C. Colp (jcolp) 2007-10-31 13:42:40

Please give the branch located at http://svn.digium.com/svn/asterisk/team/file/audiohooks-1.4 a try and report back. Thanks!

By: ptorres (ptorres) 2007-11-08 08:53:50.000-0600

We are having a very similar problem, asterisk is randomly crashing on at least 3 different environments with 5 to 50 agents, all calls are recorded (not using mixmonitor) and only 1 agent using chanspy.

attached gdb output :ast_backtrace_1.txt

We are still unable to reproduce this on our testing environment with the same compilation options.  

ast 1.4.12 (upgraded recently from 1.2.11)
zaptel 1.4.5.1
Centos 4.0 updated kernel to 2.6.9-55
Intel Core 2 Duo and Core 2 Quad ( 1Gb ram )
Agents on x-lite calling via both sip and zap (digium e1/t1 cards)


We haven´t tried http://svn.digium.com/svn/asterisk/team/file/audiohooks-1.4 yet because of the comments on another issue (http://bugs.digium.com/view.php?id=10956)

By: Tilghman Lesher (tilghman) 2007-11-12 11:57:39.000-0600

PTorres (and anybody else still having these problems):  please follow the instructions in doc/valgrind.txt.

By: ptorres (ptorres) 2007-11-13 07:56:05.000-0600

I think we´ve found something, it may be related to the ilbc codec on the spying channel.

We also saw this warning in the * console :
[Nov 12 15:02:36] WARNING[16607]: translate.c:163 framein: no samples for lintoulaw

During “ChanSpy”:
>show channel sip/1234-01234567
NativeFormats: 0x4 (ulaw)
WriteFormat: 0x40 (slin)
ReadFormat: 0x400 (ilbc)   <------ Randomly crashes after a few attempts

If we disable ilbc codec on the spying device ( i.e. x-lite options )
During “ChanSpy”:
>show channel sip/1234-01234567
NativeFormats: 0x4 (ulaw)
WriteFormat: 0x40 (slin)
ReadFormat: 0x4 (ulaw) <---- NEVER CRASHED

edit: I am about to read valgrind docs now :)



By: Perssy Llamosas (pll) 2007-11-13 10:50:31.000-0600

It could be true. I was using slin moh and slin playbacks.

I am testing the audiohooks-1.4 branch with jitter buffers enabled, so far no crashes.

By: Rene Mendoza (renemendoza) 2007-11-14 08:38:27.000-0600

i have yet to test the audiohooks branch but so far what has helped me is not using chan spy at all, instead i am using ZapBarge and some dial plan variable tricks to make it work in the scenario i was using ChanSpy (i.e. Zap to Sip calls). It all revolves to setting the AstDB key for the SIP extension number of the connected call to the relevant zap channel. There is also a key for every zap channel that also has a reference to the sip channel.

i have modified my dial plan to make it more suitable for general usage as my scenario is ZAP->queue->sip
 here is the relevant dialplan:

;incoming zap call
exten => _XXXX,1,Answer
exten => _XXXX,n,Set(__ZAPCHANNEL=${CUT(CHANNEL|-|1):4}) ;which channel did the call came tru
exten => _XXXX,n,Dial(SIP/${EXTEN}|20|gM(connect^${ZAPCHANNEL}^${EXTEN}))
exten => _XXXX,n,Set(DB(${EXTEN}/conexion)=)
exten => _XXXX,n,Hangup
exten => t,1,Hangup


;it is very important to clean after the call has ended
exten => h,1,Set(SIPCHANNEL=${DB(${ZAPCHANNEL}/conexion)})
exten => h,n,Set(DB(${SIPCHANNEL}/conexion)=)
exten => h,n,Set(DB(${ZAPCHANNEL}/conexion)=)

;when the call has connected

[macro-connect]
exten =>s,1,NoOp(${ZAPCHANNEL})
exten =>s,n,Set(DB(${ARG2}/conexion)=${ZAPCHANNEL})
exten =>s,n,Set(SIPCHANNEL=${CUT(CHANNEL|-|1):4})
exten =>s,n,Set(DB(${ZAPCHANNEL}/conexion)=${SIPCHANNEL})


;this is the context for barging in uset enters *10 and the 4 digits sip extension

exten => _*10XXXX,1,Set(TARGET=${DB(${EXTEN:3}/conexion)})
exten => _*10XXXX,n,GotoIf($["${TARGET}"=""]?no:si)
exten => _*10XXXX,n(no),Playback(the-number-u-dialed&not-yet-connected)
exten => _*10XXXX,n,Hangup
exten => _*10XXXX,n(si),ZapBarge(${TARGET})

By: Perssy Llamosas (pll) 2007-11-19 08:49:07.000-0600

audiohooks-1.4 branch crashes without coredump, no spying channels.

I am trying to get the debug output but so far I don't know what happens before the crashes so it has been at random times to me.

By: ptorres (ptorres) 2007-12-21 07:55:21.000-0600

Just a little update, so far we had NO crashes since we removed the ilbc codec from all spying devices, we will be trying with 1.4.16 ( or higher :D ) soon.

By: Jason Parker (jparker) 2008-01-18 11:29:12.000-0600

PLL, you appear to be the only person still having issues after switching to the audiohooks branch.  If you're still having crashes, could you post a backtrace of them?

By: Fabiano Heringer (fabianoheringer) 2008-01-18 12:16:40.000-0600

hi, i got the same problem, but i not tried install audiohooks because I not found in SVN, please give me correct link, the

http://svn.digium.com/svn/asterisk/team/file/audiohooks-1.4 is out...

Im using asterisk 1.4.17

Thanks

By: Jason Parker (jparker) 2008-01-18 12:18:28.000-0600

It was merged into 1.4, but did not make it into 1.4.17.  If you get the latest svn 1.4 branch, it will be there.

By: Fabiano Heringer (fabianoheringer) 2008-01-18 12:21:37.000-0600

ok, but i have a big problem with trunk versions, I didn't get work here, I have some modules that run only on stable version (like digivoice channel for E1 R2D Brazilian Channels), that's some way to add in the next stable release?

Thanks.

By: Joshua C. Colp (jcolp) 2008-01-18 13:20:47.000-0600

The 1.4 branch is not trunk, the branch ultimately becomes the next release so you shouldn't have any issues running it.

By: Fabiano Heringer (fabianoheringer) 2008-01-18 13:36:57.000-0600

oh sorry, i saw wrong, so i will test with branch...thanks!

By: Joshua C. Colp (jcolp) 2008-02-14 11:23:16.000-0600

After looking at this closer I have determined this is actually a duplicate of issue 11877. As that issue has more progress, including a patch that I would suggest trying, I am suspending this one in favor of that one. Peace.