Summary:ASTERISK-03510: Crash on unknown situation
Reporter:laserfox (laserfox)Labels:
Date Opened:2005-02-14 10:10:14.000-0600Date Closed:2008-01-15 15:28:15.000-0600
Versions:Frequency of
Environment:Attachments:( 0) agents.conf
( 1) backtrace.txt
( 2) backtrace-20052202.txt
( 3) cli-beforesegfault.txt
( 4) extensions.conf
( 5) newbacktrace.txt
( 6) queues.conf
Description:I can receive PSTN calls perfectly and use all the applications that i need. After some time an unexpected segmentation fault and nothing is reported in asterisk CLI.


Running Fedora Core 2
Digium TE405P with the 4 E1 port in use.
Comments:By: nick (nick) 2005-02-14 10:29:19.000-0600

Backtrace makes it look like a SIP bug.

By: nick (nick) 2005-02-14 10:30:40.000-0600

Also, your backtrace might be more useful if you did a make clean && make valgrind && make install.

By: Mark Spencer (markster) 2005-02-14 13:36:23.000-0600

This is a technical support issue, please pursue through support@digium.com

By: laserfox (laserfox) 2005-02-17 17:08:36.000-0600

I've talked with Kevin in #asterisk, send to him the backtrace and he recommended that i reopen the bug.

I'm sending a new backtrace (asterisk recompiled with valgrind).

edited on: 02-17-05 17:10

By: Mark Spencer (markster) 2005-02-17 17:11:51.000-0600

This backtrace is just filled with garbage.  It doesn't contain any useful information.  Is Kevin going to debug this?

By: laserfox (laserfox) 2005-02-17 17:35:42.000-0600

So, why my asterisk keep crashing?
Here are the today "crash" logs:

Restart -> Thu Feb 17 03:04:24 BRST 2005
Restart -> Thu Feb 17 04:54:52 BRST 2005
Restart -> Thu Feb 17 05:59:41 BRST 2005
Restart -> Thu Feb 17 10:01:59 BRST 2005
Restart -> Thu Feb 17 12:06:02 BRST 2005
Restart -> Thu Feb 17 14:30:38 BRST 2005
Restart -> Thu Feb 17 15:28:00 BRST 2005
Restart -> Thu Feb 17 16:03:26 BRST 2005
Restart -> Thu Feb 17 16:06:52 BRST 2005
Restart -> Thu Feb 17 16:29:44 BRST 2005
Restart -> Thu Feb 17 16:48:24 BRST 2005
Restart -> Thu Feb 17 17:31:27 BRST 2005
Restart -> Thu Feb 17 19:11:48 BRST 2005
Restart -> Thu Feb 17 19:30:39 BRST 2005


By: Mark Spencer (markster) 2005-02-17 19:12:50.000-0600

How are your agents logging in?  Can you confirm this is totally unpatched CVS asterisk?

By: Mark Spencer (markster) 2005-02-17 19:13:37.000-0600

Also, post your agents.conf, extensions.conf, queues.conf.  Do you have any kind of event that in particular causes this?

By: laserfox (laserfox) 2005-02-18 05:28:55.000-0600

They are login using exten => *11,1,AgentLogin
Yes, this CVS is not patched.

No, i can't reproduce the problem... i'll post the files.

By: laserfox (laserfox) 2005-02-18 12:33:56.000-0600

After some time in lab, i was able to reproduce the problem sometimes (not always).

I log a Grandstream BT100 with AgentLogin, call sometimes to the phone, then i press the Transfer key of the phone and Asterisk crash with segfault.

By: Anthony Minessale (anthm) 2005-02-21 14:58:31.000-0600

chanfix.diff should take care of it.
Disclaimer on file.

By: laserfox (laserfox) 2005-02-21 20:24:34.000-0600

This patch resolved one of my problems (Asterisk segfault pressing the BT100 Transfer button after make a # transfer), but i'm still having segfaults.

By: Anthony Minessale (anthm) 2005-02-24 11:55:57.000-0600

the issue is in chan_agent that much I know the patch I posted just hides the problem so forget it.

You can reproduce the problem by making a customer call a queue so it bridges to an agent that is logged in via sip then # transfer the caller back into the same queue, once any agent gets the call gets the call again, the sip channel ends up with a corrupted _bridge pointer that explodes when you do anything that looks at it like pressing the sip transfer button.

The exact steps one user followed we like this:

agent 1000
agent 1001

queue 1000 (contains agent 1000)
queue 1001 (contains agent 1001)
queue 2000 (contains both agents)

log into both agents on a sip channel
call in as a customer on zap to queue 2000 (possibly any channel)
whichever agent gets the call, # blind transfer it to the opposite agent's private queue. (eg 1001 xfer to an ext leading to queue 1002)

once the other agent gets the call attempt a sip transfer and boom

Again, this is not really related to sip xfer it's the fact that at the last step above the sip channel has a corrupted ->_bridge pointer

using an older chan_agent.c eliminated this so the issue must be in that file.

edited on: 02-24-05 11:56

By: twisted (twisted) 2005-03-08 15:26:10.000-0600

anthm, how would you propose we fix this?  Any ideas?  This one is admittantly over my head :P

By: Fernando Romo (el_pop) 2005-03-13 23:14:53.000-0600


which version of chan_agent.c are working? checking the cvs log i presume you are using rev. 1.120 before the pvt changes.

only reverse chan_agent.c version? or back res_musiconhold.c too?

By: Mark Spencer (markster) 2005-03-14 00:04:56.000-0600

Are you using the sip transfer button to do the transfer the first two times?

By: damin (damin) 2005-03-17 21:33:26.000-0600

laserfox: Your last update to this bug was almost a month ago. Are you still having the issues with current CVS? If not, then can we close this out? If so, can you provide some more debugging information to help us pinpoint with new code?

anthm: You have pinpointed that this is a corrupted bridge pointer and that an earlier version of chan_agent.c doesn't exhibit the same problem. Do we know, or need to know between what versions the problem occurs?

markster: You mentioned on the Dev conference that this should not be an issue in Stable, and that you believed that it had been fixed in current CVS.

By: laserfox (laserfox) 2005-03-18 08:16:58.000-0600

damin, i´ve updated to current CVS (yesterday) and i can simulate the problem yet.

I´ll try to debug this with Mark today.

By: Mark Spencer (markster) 2005-03-20 02:10:09.000-0600

Still need you to find me to work on this if it's still an issue.

By: damin (damin) 2005-03-20 11:36:43.000-0600

Perhaps we can setup a specific time/day/location to work on this issue? I think that if we can get a couple of people to reproduce and backtrace it on different systems, we might be better able to find the source of the problem and fix it. I'm not sure what country laserfox is in, but I'm located in US EST timezone, and I hang out on #asterisk, as does Kram.

By: Mark Spencer (markster) 2005-03-22 13:23:41.000-0600

Fixed in CVS head.

By: Russell Bryant (russell) 2005-03-31 21:01:42.000-0600

since someone said that an older version worked, I'm going to assume this is not an issue in 1.0

By: Digium Subversion (svnbot) 2008-01-15 15:28:15.000-0600

Repository: asterisk
Revision: 5229

U   trunk/channels/chan_agent.c

r5229 | markster | 2008-01-15 15:28:15 -0600 (Tue, 15 Jan 2008) | 2 lines

Fix chan_agent segfault (bug ASTERISK-3510)