[Home]

Summary:ASTERISK-06869: [patch] blind transfering a sip call to parking causes chan_sip to hang.
Reporter:Will McCown (flynwill)Labels:
Date Opened:2006-04-28 10:59:56Date Closed:2006-06-26 14:15:56
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20060524_bug7053_trail.patch
( 1) asterisk.out
( 2) asterisk.strace.gz
( 3) bt
( 4) bt2
( 5) bt_-_bxfer_to_park_crash_r35158.txt
( 6) debug.log
( 7) debug_-_bxfer_to_park_crash_r35158.txt
( 8) full
( 9) full-05092006.gz
(10) full2
(11) gdboutput.txt
Description:If a SIP phone user uses a blind transfer to transfer a call to parking chan_sip hangs -- SIP phones can no longer initiate calls.  Calls between ZAP channels continue to be processed normally.  Also calls to SIP phones from other channels appear to continue to work.  Asterisk ignores restart commands including "restart now", and must be killed (kill -9) and restarted.  The phone that initiated the transfer continues to show the call as being on hold.

The only case where I've seen Asterisk recover from the hang is if the parked call is retreived immediatly by a non-SIP channel (a ZAP channel in my case).

I've tested this on two different SIP phones -- a PolyCom IP501 and the eyeBeam Softphone, and both fail in the same way.

If a normal supervised transfer is used everything works as it should.

I'm reporting this under 1.2.7.1, but we have also observed the problem in  SVN-branch-1.2-r9156M.  I also don't know if it is a SIP issue (where I'm reporting it), or a core Asterisk issue.




****** ADDITIONAL INFORMATION ******

Here is console log from the event:

pbxhost2*CLI>
   -- Starting simple switch on 'Zap/24-1'
   -- Executing Set("Zap/24-1", "CALLERID(all)=Intertel <>") in new stack
   -- Executing NoOp("Zap/24-1", "Got 7217") in new stack
   -- Executing Goto("Zap/24-1", "extensions|7217|1") in new stack
   -- Goto (extensions,7217,1)
   -- Executing Macro("Zap/24-1", "stdexten|7217|SIP/Poly-047a70")
   -- Executing Dial("Zap/24-1", "SIP/Poly-047a70|20") in new stack
   -- Called Poly-047a70
   -- SIP/Poly-047a70-0720 is ringing
   -- SIP/Poly-047a70-0720 answered Zap/24-1
   -- Started music on hold, class 'default', on channel 'Zap/24-1'
   -- Stopped music on hold on Zap/24-1
pbxhost2*CLI> show channels
Channel              Location             State   Application(Data)
SIPPeer/SIP/Poly-047 s@default:1          Down    (None)
Parking/Zap/24-1     s@macro-stdexten:1   Down    (None)
2 active channels
1 active call
Apr 28 08:44:02 WARNING[25185]: channel.c:787 channel_find_locked: Avoided deadlock for '0x8211808', 10 retries!
pbxhost2*CLI> sip show channels
Peer             User/ANR    Call ID      Seq (Tx/Rx)  Form  Hold     Last Message
10.9.100.1       Poly-047a7  7e180c8d6ec  00102/00002  ulaw  Yes      Rx: REFER
1 active SIP channel

The messages file has no information other than the same "Avoided deadlock" messages in the console printout.

Comments:By: Andrey S Pankov (casper) 2006-04-28 16:52:19

Can you recompile asterisk with DEBUG_xxx uncommented and 'make dont-optimize'.
Please note about "dont-optimize: install" in the main Makefile...

By: Will McCown (flynwill) 2006-05-01 12:03:14

Ok, did all that.  No additional debugging information in the logs or on the console.  Failure rate is no longer 100%, but it fails more than half of the time.

By: Serge Vecher (serge-v) 2006-05-01 12:43:34

Ok, in order to proceed further, we will need to determine whether this is a chan_sip or a core issue.

Please perform the following procedures to enable chan_sip debugging while you blind transfer the call.
1) Prepare test environment (reduce the ammount of unrelated traffic on the server);
2) Enable SIP transaction logging with the following CLI commands:
set debug 4
set verbose 4
sip debug
3) Save complete log to a file and _attach_ said file to the bug.


Additionaly: you have indicated that SVN-branch-1.2-r9156M has this issue. The version number indicates you have made modifications to the source code. Can you please elaborate on the modifications? Were these applied to 1.2.7-1 sources as well?

By: Will McCown (flynwill) 2006-05-01 13:16:26

Ok, copy of console log for an event attached.

Regarding the modifications.  I believe the only mode we made to SVN-branch-1.2-r9156M  was the fixes related to issue 6158.  Our
submitted patch was not rolled into 1.2.7.1, but a different fix
was.  That fix is not sufficient -- an issue I've not gotten around
to reporting yet.  In any case, I backed that mod out before this
most recent test.  

Would there be any value to simplifying the installation as a test?
(Turn off "realtime" database, voicemail ODBC, etc)?

By: Andrey S Pankov (casper) 2006-05-01 14:41:37

> Would there be any value to simplifying the installation as a test?
It's always worse to simplify the installation when you are not sure where is a bug... :)

By: Andrey S Pankov (casper) 2006-05-01 14:47:09

You didn't enable sending of debug level messages to the verboser in your
logger.conf. Just add ',debug' to the 'console=>' entry.

By: Will McCown (flynwill) 2006-05-01 15:15:23

Enabling debug on the console made it impossible to work due to the flurry of messages of the form:

May  1 13:01:09 DEBUG[13702] sched.c: ast_sched_wait()
May  1 13:01:09 DEBUG[13708] sched.c: ast_sched_runq()

So instead I enable the "full" log, and have attached that.  (You'll still have to dig through the same messages.)  I attached the whole log from where the call is started on a zap channel to where I killed (with -9) the asterisk.  The transfer to parking happens at about line 1736.  I hope this has the information you were seeking.

By: Will McCown (flynwill) 2006-05-04 13:43:39

I have been doing some tracing by adding additional ast_log calls to the code.

The problem appears to be a deadlock of some sort.

The sequence of events is as follows:

The SIP refer request causes chan_sip to enter the routine sip_park.

That in turn calls ast_channel_masquerade to masquarade "Zap/24-1" as "Parking Zap/24-1" which succeeds.

Then it calls ast_chanel_masquerade aagin to masquarade "SIP/Poly-047a70-21bd" as "SIPPeer/SIP/Poly-047a70-21bd" which hangs.

ast_channel_masquerade is hung calling "ast_mutex_trylock" on "SIP/Poly-047a70-21bd" in a while loop.

Since I have no understanding on exactly how these locks are supposed to work this is about as far as I can go for the moment.

By: Will McCown (flynwill) 2006-05-04 18:45:05

Ok, I got a bit further -- ast_channel_masquerade is hung trying to aquire a lock on the SIP channel and can't because the lock is obtained by ast_write which
in turn is hung on a call to what I presume to be the write method for the call.

Speculation -- that write is hung because the SIP phone (having done what a blind transfer) is no longer accepting channel data.

By: Shaun (sdaigle) 2006-05-05 09:29:01

Issue# 0006276 seems to be 100% the same as this one.  A relationship should be created between the two issues.

Shaun

By: Andrey S Pankov (casper) 2006-05-05 13:10:28

ASTERISK-6116 was in asterisk for ages and it seems there is no way to fix it.
This is not call parking issue, it is a blind transfer one (or transfers
in general).

For reference you may add a relation to ASTERISK-2889 - closed after being suspended.
(reported my me and in MGCP category)

By: Andrey S Pankov (casper) 2006-05-05 13:13:06

And devicestates made the issues even worst and more unclear.

By: Will McCown (flynwill) 2006-05-08 11:09:09

This seems like a pretty serious problem to write off as "unfixable".  It would seem to me that if the underlying cause can't be found and prevented, then at least some sort of recovery mechanism should be considered so that the only damage is a dropped call, not a hung system.

By: Serge Vecher (serge-v) 2006-05-08 11:30:50

flynwill: based on the SIP logs and your analysis, the problem is not in the SIP channel. Although to completely rule it out, you would need to test with another channel (say IAX) and still get a crash.

can you please rebuild the latest 1.2 branch code with 'make dont-optimize' and post a backtrace here after the crash as an attachment?

Thanks.

By: Lance Kimes (lkimes) 2006-05-08 17:59:27

Just to confirm, you want us to test whether we can appear to hang chan_iax by doing the same thing?  a blind transfer from a iax client to see if it hangs as well?

By: Serge Vecher (serge-v) 2006-05-08 18:12:44

no, sorry for being unclear: first priority is to get a non-optimized backtrace from the latest 1.2

By: Will McCown (flynwill) 2006-05-09 13:28:49

Ok I downloaded the latest 1.2 branch, it calls itself "SVN-branch-1.2-r25608".

I've uploaded a full log, and a gz'd strace of a hang.  Let me know if this helps.

By: Serge Vecher (serge-v) 2006-05-09 13:43:25

hmm, I think I've asked for a backtrace, please see the relevant section on http://www.voip-info.org/tiki-index.php?page=Asterisk%20debugging

P.S. Originally, you've tagged this as crash. Does it crash or hang ?



By: Will McCown (flynwill) 2006-05-09 14:28:11

Ah sorry, totally misunderstood what you are after (and hadn't managed to find that page on the wiki before).  I'll try to get a backtrace after a hang this afternoon.

Re your PS.  It is a hang, or deadlock, as I said in the description only chan_sip seems to be hung, ZAP or IAX channel calls continue to be processed.  Sorry for using the term "crash" in the description.

By: Lance Kimes (lkimes) 2006-05-09 17:03:24

I've uploaded bt, the backtrace, and debug.log which is a full log of asterisk from startup to hang.

By: Serge Vecher (serge-v) 2006-05-09 18:36:11

lkimes: looks like a locking issue here in chan_sip.
We need to see in detail exactly what is chan_sip doing prior to going into a deadlock.

Can you please enable sip debug and capture the log for oej to look at? Thanks

By: Lance Kimes (lkimes) 2006-05-09 18:51:34

Uploaded bt2 and full-05092006.gz as requested.

By: Lance Kimes (lkimes) 2006-05-09 18:59:36

Enabling sip debug altered its behavior in that it took several repeated attempts before chan_sip would lockup.  Previously, it would lockup at the first attempt.

By: Serge Vecher (serge-v) 2006-05-09 21:25:43

Olle: can you, please, take a look at this one? Thanks.

By: Will McCown (flynwill) 2006-05-10 15:38:08

I spent a while poking around with gdb trying to understand the problem (and learning gdb in the process).

Here's the hung state as best as I can decypher:

Thread 7 is stuck in a loop at channel.c:2742 (in ast_channel_masquerade() ) trying to lock both the original and the clone for a channel masquerade.  It is never able to aquire a lock on the clone, because that is currently locked by thread 2, according to the ast_mutex_t structure this lock was aquired at channel.c:2191 in ast_write().

Thread 2 is blocked at chan_sip.c:2549 in sip_write() attempting to aquire yet another lock.  This one owned by thread 7 and aquired at chan_sip.c:3190 in findcall().

So it's a classic deadlock...  I hope this is enough information to understand what went wrong and how it should be fixed.  I know I'm in way too deep for my abilities and I think I'll swim for the shore now.

By: Olle Johansson (oej) 2006-05-11 02:52:13

Hmm. I can't repeat this. Anyone else that can repeat this?

By: Mark Spencer (markster) 2006-05-11 02:53:21

Thank you for taking the time to diagnose this in such detail.  You're definitely helping get us on the right track.  

Does the problem occur if you are not using realtime sip peers / sip users?

By: Will McCown (flynwill) 2006-05-11 14:32:37

Turned off realtime for sipusers and sippeers.  Entered a single phone into sip.conf which is the Polycom I've been doing most of the tests with.  Asterisk fails in the same way.

By: Lance Kimes (lkimes) 2006-05-11 17:22:19

We also considered whether the race condition might be due to running on a fast multiproc. We disabled one of the cpus, but it didn't have an affect on the problem. FYI.  Do you want us to send you our rpm of Asterisk 1.2.7.1?

By: Olle Johansson (oej) 2006-05-16 15:31:10

We might have to get access to this system. Can you contact me or Mark on IRC about this? Mark on US time zones, me on European time zones :-)

By: Will McCown (flynwill) 2006-05-18 09:34:06

I hate to sound like a Dinosaur, but I've never used IRC.  

However, we can get you access to the system -- probably by way of ssh.  What we'll have to do is pick a time that works for both Lance (lkimes) and I to be here.  We're on PDT (GMT-7) so my guess is that we'll have to do it early some morning.  I normally get to work by 7:00 AM, and Lance sometimes does as well.

I believe that we've gotten external IAX connections working to our production Asterisk box to work through our firewalls, so we should be able to do that for voice.  Send me email will@rhythm.com

By: Will McCown (flynwill) 2006-05-19 10:18:55

Olle or Mark:

I was staring at the code this morning trying to understand how it is supposed to work (and maybe through that understand why it is locking...), but I came across something very confusing.

Quoting a bit from chan_sip:

/*! \brief  sip_park: Park a call ---*/
static int sip_park(struct ast_channel *chan1, struct ast_channel *chan2, struct sip_request *req)
{
       struct sip_dual *d;
       struct ast_channel *chan1m, *chan2m;
       pthread_t th;
       chan1m = ast_channel_alloc(0);
       chan2m = ast_channel_alloc(0);
       if ((!chan2m) || (!chan1m)) {
               if (chan1m)
                       ast_hangup(chan1m);
               if (chan2m)
                       ast_hangup(chan2m);
               return -1;
       }
       snprintf(chan1m->name, sizeof(chan1m->name), "Parking/%s", chan1->name);
       /* Make formats okay */
       chan1m->readformat = chan1->readformat;
       chan1m->writeformat = chan1->writeformat;
       ast_channel_masquerade(chan1m, chan1);


So it appears you're making two new channels (chan1m and chan2m) and then "masquarading" them...  What confuses me is the defination of ast_channel_masquarade:

int ast_channel_masquerade(struct ast_channel *original, struct ast_channel *clone)

Why are we calling the new structure the "original" and the existing structure the "clone"?

By: Will McCown (flynwill) 2006-05-19 15:19:57

FYI:

I understand the problem a bit more now.  (Pardon me while I explain what I'm sure you already know...)

There seem to be two locks associated with a sip channel:

chan->lock
chan->tech_pvt->lock

Thread 2 is (I presume) handling a normal write to the channel and so the routine ast_write() locks chan->lock and calls sip_write().  sip_write() locks chan->tech_pvt->lock.

Thread 7 is asynchronously handling the transfer request.  The routine sipsock_read() gets the request and a routine called find_call() seems to be used to find the associated channel.  It does so and locks chan->tech_pvt->lock,  sipsock_read() further processes the SIP messages and ends up calling sip_park() which in turns ast_channel_masquerade() which finally tries to lock chan->lock.

So as you can see the two threads are trying to aquire the same two locks, and trying to do so in the opposite order thus allowing the deadlock.

I don't know what the "correct" fix is (or even if there is one).  The only suggestion I can make is that find_call() should attempt to lock both chan->tech_pvt->lock and chan->lock (which it can find as tech_prt->owner->lock).  And probably needs to have a loop that releases the first if the second fails and then tries again.  

Since we are using nestable locks this should be ok, but it will take a fair amount of tracking to make sure sipsock_read() is the only caller of find_call() (doxygen thinks it is), and that sipsock_read() takes responsibility for unlocking the tech_pvt->owner->lock on every possible return.

By: Will McCown (flynwill) 2006-05-19 16:48:49

Actually ignore that last suggestion...  Looking closer I see that immediately after the call to find_call() there is code to lock the parent.  The real culprit is a bit later on in handle_request_refer():

            if (!strcmp(p->refer_to, ast_parking_ext())) {
                /* Must release c's lock now, because it will not longer
                   be accessible after the transfer! */
                *nounlock = 1;
                ast_mutex_unlock(&c->lock);
                sip_park(transfer_to, c, req);
                nobye = 1;

So you're releasing the lock prematurely.  I'm not sure why the lock would be unaccessible after the call to sip_park(), perhaps this is only this way because that's the way the normal case for a blind transfer (in the lines below the quoted bit of code) are written.  Worse case is sip_park() could release the lock itself at the correct place -- as it is only called here.

By: Will McCown (flynwill) 2006-05-24 11:41:13

The patch I just uploaded does seem to fix the problem, and doesn't (after very limited testing) seem to introduce any other problems.  I believe this is safe because it appears that the channel will not generally be destroyed on the first call through the masquarade process, rather it will only be marked as a "ZOMBIE".

HOWEVER!  I would really appreciate it if someone who actually understands the code took a closer look at this.

By: Serge Vecher (serge-v) 2006-05-25 20:47:48

flynwhill: no side effects from your patch?

By: Will McCown (flynwill) 2006-05-26 09:09:48

No side-effects so far, but it's only been running on our test machine, which only sees very light traffic.  Monday morning we plan to swap the test machine with the production one for a real load test.  I'll let you know how that goes.  Since this patch only effects a SIP blind transfer to parking, I really don't expect that any issues will turn up there either.

More interesting to me would be if Olle could install this patch on whatever machine he was using to try an replicate the problem.  That's where side effects would turn up -- systems where the whole timing of events is different.

By the way I see this same early lock release in the normal blind transfer case, but I don't know if it as the possibility of leading to the same sort of deadlock.  (And it hasn't happened in my limited testing).

By: Olle Johansson (oej) 2006-06-01 12:25:20

Can you please try this again with svn trunk, since we changed all of transfer today. Thanks!

By: Will McCown (flynwill) 2006-06-01 14:08:24

Ok we can do that.  It will likely be next week sometime before we can however, since the "test" machine is currently online as the "production" machine.

Can you tell me the correct svn command to fetch the "trunk" as you call it.  I get confused by "trunks", "branches", etc...

Also do we need to pull down the latest zaptel and libpri as well?

By: Olle Johansson (oej) 2006-06-01 14:55:32

You can find instructions for installation of svn trunk on http://www.asterisk.org. Thanks for testing.

By: Todd Shore (teran) 2006-06-03 21:27:21

I have noticed slightly different behavior.  My system hangs when doing attended transfers to park and crashes during unattended transfers to park.  Failure rate is hard to predict but is less than 10%.  Sometimes I get crashes every hour or two and sometimes it almost makes it through the day.  We typically handle 5 calls at a time and use park exclusively - no extension to extension transfers.

Flynwill, when applying your patch against 1.2.7.1 or 1.2.8 I get a warning that the patch ends mid line.  Any advice?

Oej, would this patch break metermaids applied to 1.2.7.1?

By: Will McCown (flynwill) 2006-06-04 14:10:45

Teran:

I would have to look at the 1.2.8 source to see why the patch was failing.  The change is to move one line of code:

BEFORE:

          if (!strcmp(p->refer_to, ast_parking_ext())) {
                /* Must release c's lock now, because it will not longer
                   be accessible after the transfer! */
                *nounlock = 1;
                ast_mutex_unlock(&c->lock);
                sip_park(transfer_to, c, req);
                nobye = 1;

AFTER
          if (!strcmp(p->refer_to, ast_parking_ext())) {
                /* Must release c's lock now, because it will not longer
                   be accessible after the transfer! */
                *nounlock = 1;
                sip_park(transfer_to, c, req);
                ast_mutex_unlock(&c->lock);
                nobye = 1;


Maybe you can just edit chan_sip.c by hand and make the change.

This code is ONLY reached in the event of a SIP phone making a blind transfer to parking.  So that is the only situation it will have any effect on.

That said, if you can reproduce the problem at all you may want to test the svn trunk as we will be doing shortly as they say they've totally re-written transfer.

By: Olle Johansson (oej) 2006-06-05 10:41:17

Please test this again with svn trunk. Thanks.

By: Todd Shore (teran) 2006-06-06 10:54:40

oej,
I'm sort of in a bind.  My production system is the only one that crashes - about once a day now on 1.2.7.1 with the above patch.  On Wednesday night I'm bringing up an identical system into production that is going to handle even more traffic and is therefore probably more likely to crash.

My problem is that I have to have your metermaids patch installed because of user requirements. (It crashes with or without metermaids)  Do you have metermaid for the current SVN trunk?  Would it be riskier to put it into production than my current crash level or 1 to 2 times a day?

By: Todd Shore (teran) 2006-06-06 11:00:47

I posted the gdb output from the latest crash.

By: Andrey S Pankov (casper) 2006-06-06 13:06:48

That may be due to format_mp3 module. Is it reproducible without it?

By: Serge Vecher (serge-v) 2006-06-19 12:07:11

flynwill: any luck with testing for this problem in trunk?

By: Will McCown (flynwill) 2006-06-20 10:40:37

No sorry we haven't been able to test, in fact we haven't been able to compile under Suse 9.3.  The problem is in compiling asterisk-addons which fails with the following message:

gcc -pipe -fPIC -Wall -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations   -D_REENTRANT -D_GNU_SOURCE  -O6    -c -o format
_mp3.o format_mp3.c
In file included from /usr/include/asterisk/logger.h:28,
                from /usr/include/asterisk/lock.h:83,
                from format_mp3.c:20:
/usr/include/asterisk/compat.h:23: error: syntax error before "__extension__"
/usr/include/asterisk/compat.h:23: error: syntax error before '&&' token
In file included from /usr/include/asterisk/utils.h:36,
                from /usr/include/asterisk/cdr.h:48,
                from /usr/include/asterisk/channel.h:115,
                from format_mp3.c:21:
/usr/include/asterisk/strings.h:264: error: syntax error before "__extension__"
/usr/include/asterisk/strings.h:264: error: syntax error before ';' token
/usr/include/asterisk/strings.h:264: error: `__len' undeclared here (not in a function)
/usr/include/asterisk/strings.h:264: error: initializer element is not constant
/usr/include/asterisk/strings.h:264: error: syntax error before "if"
/usr/include/asterisk/strings.h:264: error: redefinition of `__retval'
/usr/include/asterisk/strings.h:264: error: `__retval' previously defined here
/usr/include/asterisk/strings.h:264: error: syntax error before "const"
/usr/include/asterisk/strings.h:264: error: syntax error before '}' token
/usr/include/asterisk/strings.h:280: error: conflicting types for `strtoq'
/usr/include/stdlib.h:362: error: previous declaration of `strtoq'
format_mp3.c:46: error: redefinition of `struct ast_filestream'
format_mp3.c:325: warning: function declaration isn't a prototype
format_mp3.c: In function `load_module':
format_mp3.c:336: warning: passing arg 1 of `ast_format_register' from incompatible pointer type
format_mp3.c:336: error: too many arguments to function `ast_format_register'
format_mp3.c: At top level:
format_mp3.c:342: warning: function declaration isn't a prototype
format_mp3.c:347: warning: function declaration isn't a prototype
format_mp3.c:359: warning: function declaration isn't a prototype
format_mp3.c:365: warning: function declaration isn't a prototype
{standard input}: Assembler messages:
{standard input}:47: Error: symbol `__retval' is already defined
make[1]: *** [format_mp3.o] Error 1
make[1]: Leaving directory `/local/a/src/asterisk/asterisk-addons/format_mp3'
make: *** [format_mp3/format_mp3.so] Error 2

I just this morning updated our source tree and tried again with no luck.

This probably needs to be reported in a separate bug report, but we've be pretty busy with other issues.  The net result is I can't just drop this version into our existing configuration an test.  I can probably work around this issue for the purpose of testing, but it will take some time to set it up.

By: Serge Vecher (serge-v) 2006-06-20 11:22:02

flynwill: asterisk-addons will not compile with current trunk ATM, pending the completion of kpleming's loader changes to be merged into the trunk. Do you have to have addons? I would not recommend running format_mp3 on a production system -- it has not been maintained in a long time and causes issues.

By: Will McCown (flynwill) 2006-06-20 11:27:47

We're using app_addon_sql_mysql, I just tried installing just that (or actually
the three sql related modules: app_addon_sql_mysql.so  cdr_addon_mysql.so  res_config_mysql.so.  But asterisk refuses to start, message below.

I can probably turn off sql, and build a minimal dialplan for testing this bug.
But, to be complete I'll have to duplicate the error first.

Asterisk SVN-trunk-r35123, Copyright (C) 1999 - 2006 Digium, Inc. and others.
Created by Mark Spencer <markster@digium.com>
Asterisk comes with ABSOLUTELY NO WARRANTY; type 'show warranty' for details.
This is free software, with components licensed under the GNU General Public
License version 2 and other licenses; you are welcome to redistribute it under
certain conditions. Type 'show license' for details.
=========================================================================
 == Parsing '/etc/asterisk/logger.conf': Found
Asterisk Event Logger Started /var/log/asterisk/event_log
 == Parsing '/etc/asterisk/dnsmgr.conf': Found
Asterisk Dynamic Loader loading preload modules:
 == Parsing '/etc/asterisk/modules.conf': Found
[res_config_mysql.so]Jun 20 09:43:09 WARNING[32632]: loader.c:741 __load_resource: Key routine returned NULL in module /usr/lib/asteri
sk/modules/res_config_mysql.so
Jun 20 09:43:09 WARNING[32632]: loader.c:750 __load_resource: 5 errors loading module /usr/lib/asterisk/modules/res_config_mysql.so, ab
orted
Jun 20 09:43:09 WARNING[32632]: loader.c:847 print_and_load: Loading module res_config_mysql.so failed!



By: Serge Vecher (serge-v) 2006-06-20 16:18:54

flynwill: fwiw, I have reproduced the problem in latest trunk r35158 too.

By: Serge Vecher (serge-v) 2006-06-20 16:27:28

attached is the debug log with the offending REFER and respective bt from non-optimized build.

By: Will McCown (flynwill) 2006-06-21 16:04:40

Ok I was finally able to put together a simplified test system with the database stuff turned off.

I compiled and installed asterisk-1.2.9.1, zaptel-1.2.6, libpri-1.2.3

Using a call orginated on an IAX channel and answered by a SIP Phone I was able to confirm that this setup seems to hang in what I think is the same way when the SIP phone does a blind transfer to parking.

I then compiled and installed the trunk asterisk (revision 35366), zaptel (rev 1135) and libpri (rev 354).

As far as I can tell transfer is totally broken for sip phones.  Attempts to transfer a call (to parking or to another phone) simply fail and the original call is lost to the sip phone (and left off hook on the IAX side).

The good news is at least it doesn't hang....

By: Olle Johansson (oej) 2006-06-22 02:06:36

I can successfully park calls with latest 1.2 from svn. How do I repeat this?

By: Will McCown (flynwill) 2006-06-22 10:24:41

Olle-

I don't know.  You asked me to test with the "trunk" because you had change "all of transfer".  To the best of my understanding "trunk" and "latest 1.2" are not the same thing correct?

The failure of the "trunk" to transfer seems pretty complete, but I only had very limited test tools.  I was using a single SIP soft-phone (Eyebeam 1.1) on a test box, and generating calls to that box with an IAX link from our production system.

I was using the "xfer" key on the softphone, not the "#1" or "*1" (in fact I think those are commented out in our features.conf).  Since there was only the one SIP phone on the system I only tried transfer to parking and transfer back to the same number.

We'd love to help further, but at this point we need to give priority to other aspects of our planned final deployment.  (Which includes dual redundant servers.)  We have quite a lot of work to do there.  At this point we're going to stick to 1.2.7.1 with a specific set of patches as that seems to be stable.

By: Will McCown (flynwill) 2006-06-22 10:29:53

One more thing.

The failure of the trunk version of asterisk-addons to compile as noted in my message of 6/20 appears to be unrelated to the version of asterisk.  Older versions including our "hero" version (1.2.2) also fail to compile in the same way.  The system in question has a slightly newer kernel and internal distribution than the production asterisk boxes, so we are tracking down what we broke now.  (Probably some rpm was left our or added to the new system)

By: Olle Johansson (oej) 2006-06-26 14:14:59

During my work with metermaids, I've done many blind transfers to parking, and it works in trunk.

By: Olle Johansson (oej) 2006-06-26 14:15:53

Suspended until we have new information, since I can't repeat this issue.