[Home]

Summary:ASTERISK-06024: Problems caused by MySQL realtime driver
Reporter:Kenneth Holm (saitech)Labels:
Date Opened:2006-01-09 16:46:33.000-0600Date Closed:2007-02-05 15:59:21.000-0600
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) core_backtrace.txt
( 1) core_backtrace2.txt
( 2) verbose+debug.txt
Description:I have just noticed, that all my asterisk boxes, are having the same error. Its both on Linux 2.4 and 2.6, and with SuSE, Debian, Gentoo end CentOS. Im also running with some different versions of asterisk. Asterisk 1.2, 1.2.1 and an old CVS Head from March 2005.

It seems like this debug output is triggered by a hangup call, that in some way goes wrong. I dont really know what kind of debugging you want for this error, but  just tell me, men i will provide it.

Generel about this error, is its making Asterisk to hang, while its output'd to the debug log, and thats for about 10-20 seconds. I have been able to reproduce the error, when im logging on to my server with EyeBeam Softphone, and dial a random number, when i hear a dialup tone, i hangup, and performs a new call. While doing this for about 8-10 calls, the error occurs.

I have been told before, that this is not an error, but only a debug log, but im  convinced, that some kind of error is behind this debug log, and its getting Asterisk to freeze, which causes Asterisk to reject or severe delay on calls, bot in/out.

****** ADDITIONAL INFORMATION ******

Jan  9 23:57:18 DEBUG[32258] chan_sip.c: Setting NAT on RTP to 524288
Jan  9 23:57:18 DEBUG[32258] chan_sip.c: Outgoing Call for 0004580808808
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM extensions WHERE exten = '80808808' AND context = 'services' AND priority = '-1'
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Everything is fine.
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM extensions WHERE exten LIKE '\_%' AND context = 'services' AND priority = '-1' ORDER BY exten
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Everything is fine.
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: (Provisional) Stopping retransmission (but retaining packet) on '392eef613aaf9f7736184aa56ec1f98f@telefin.dk' Request 102: Found
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: (Provisional) Stopping retransmission (but retaining packet) on '392eef613aaf9f7736184aa56ec1f98f@telefin.dk' Request 102: Found
Jan  9 23:57:18 DEBUG[32258] chan_sip.c: update_call_counter(0004580808808) - decrement call limit counter
Jan  9 23:57:18 DEBUG[32258] chan_sip.c: Acked pending invite 102
Jan  9 23:57:18 DEBUG[32258] chan_sip.c: Stopping retransmission on '392eef613aaf9f7736184aa56ec1f98f@telefin.dk' of Request 102: Match Found
Jan  9 23:57:18 DEBUG[32258] chan_sip.c: Stopping retransmission on '392eef613aaf9f7736184aa56ec1f98f@telefin.dk' of Request 102: Match Found
Jan  9 23:57:18 DEBUG[32258] app_dial.c: Exiting with DIALSTATUS=CANCEL.
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM extensions WHERE exten = 'h' AND context = 'services' AND priority = '1'
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Everything is fine.
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM extensions WHERE exten = 'h' AND context = 'services' AND priority = '1'
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Everything is fine.
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM extensions WHERE exten = 'h' AND context = 'services' AND priority = '1'
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Stopping retransmission on '392eef613aaf9f7736184aa56ec1f98f@telefin.dk' of Request 102: Match Not Found
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Everything is fine.
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Retrieve SQL: SELECT * FROM extensions WHERE exten = 'h' AND context = 'services' AND priority = '1'
Jan  9 23:57:18 DEBUG[32258] res_config_mysql.c: MySQL RealTime: Everything is fine.
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Stopping retransmission on '392eef613aaf9f7736184aa56ec1f98f@telefin.dk' of Request 102: Match Not Found
Jan  9 23:57:18 DEBUG[32258] cdr_addon_mysql.c: cdr_mysql: inserting a CDR record.
Jan  9 23:57:18 DEBUG[32258] cdr_addon_mysql.c: cdr_mysql: SQL command as follows: INSERT INTO cdr (calldate,clid,src,dst,dcontext,channel,dstchannel,lastapp,lastdata,duration,billsec,disposition,amaflags,accountcode,userfield) VALUES ($
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Jan  9 23:57:18 DEBUG[26957] chan_sip.c: Failed to grab lock, trying again...
Comments:By: Olle Johansson (oej) 2006-01-10 00:51:33.000-0600

Would it be possible to test this without mysql? The mysql realtime driver has some problems and I just want to eliminate that from the SIP picture.

Also, please try to capture a call that locks with SIP debug turned on, as well as SIP history. Add dumphistory=yes to sip.conf, turn on sip debugging, sip history and set debug to 4 and verbosity to 4.

That would be great reading!

By: Kenneth Holm (saitech) 2006-01-10 08:43:02.000-0600

I have now diabled every MySQL connection on my asterisk box. I cannot reproduce the error anymore. I have tried to enable cdr_mysql.so, and neither here, i can reproduce the error.

So apparently, it seems to be the mysql realtime driver, which im sad to say, is not an option to remove in our system.

Do you have a fix for the realtime driver, or is there any future fixes or workarounds/other features, what can be used instead?

By: Olle Johansson (oej) 2006-01-10 09:33:36.000-0600

I guess we have to ask people that know the MySQL driver. I am just very suspicious of it, and it seems I should be :-)

By: Olle Johansson (oej) 2006-01-10 09:34:54.000-0600

You might want to test using odbc to mysql instead.

By: Russell Bryant (russell) 2006-01-10 22:50:58.000-0600

It's not an option?  Why can't you use res_config_odbc?

I'm downgrading this from major, since this module is not included in Asterisk, and a workaround (using ODBC) exists.

By: Russell Bryant (russell) 2006-01-10 22:57:16.000-0600

We will need a backtrace from Asterisk when this is occuring.

Please build and install asterisk using 'make dont-optimize'.

Then, when this is occurring, you can use the script 'ast_grab_core' to get the backtraces we need.  The script is avilable in /usr/src/asterisk/contrib/scripts/.

By: Russell Bryant (russell) 2006-01-10 23:07:58.000-0600

does this delay occur with DEBUG logging turned off?

By: Kenneth Holm (saitech) 2006-01-12 03:54:46.000-0600

I have now applied odbc realtime to my asterisk production, it seems that the problem still occure. I have uploaded a backtrace. (core_backtrace2.txt).

By: Kenneth Holm (saitech) 2006-01-15 15:45:33.000-0600

I'm actually having a huge problem with this. It seems like the problem is occuring most in primetime, so it could be related to serverload. Though the serverload doest not excede 30% unless this error occure, and then it is on about 60%.

I really hope, that we can find a solution for this problem, otherwise im gonna bow to my collegue, and turn to another solution than asterisk, but i dont want to, because my opinion is that asterisk is a great product, that im comfortable to work with.

By: Kenneth Holm (saitech) 2006-01-16 16:30:13.000-0600

I've been looking through the code around this error. Im on the edge to try limit this loop, to maximum 5 times, and if it doest not succeed within the 5 times of trying, it should hangup the sip session that causes the problem.

Is this a good idea? Could anyone help me with this?

Is there no other solution to the problem?

It could be nice, if someone of you could descibe whats actually going on, and why this debug is written., !

By: Kenneth Holm (saitech) 2006-01-16 17:43:20.000-0600

I would be very pleases, if you could look on this bug report agagin, though the ODBC module didn't solve my problem.

By: Kenneth Holm (saitech) 2006-01-19 17:04:35.000-0600

After some thoughts, im actually beginning to suspekt the MySQL server to this problem.

How does asterisk react, if there are some delay on a SQL retrievel for the dialplan?

Im only getting the error while using a realtime module.

Today, i have sat up a new asterisk server, for internal telephony, so its not very loaded. Its running on the standalone db, with small tables, so now 10K lines diaplans. I have'nt seen the error on this server, and i cant really reproduce it.

By: Stuart Henderson (stuarth) 2006-01-29 15:10:20.000-0600

Just seen the same "chan_sip.c: Failed to grab lock, trying again" error message with cdr_odbc (mssql) and no realtime. The database definitely went down at the around time the message started being logged and asterisk was unresponsive (log entry repeated ad infinitum).

By: Stuart Henderson (stuarth) 2006-02-02 18:42:37.000-0600

Seems the same as bug 5942.

By: apignard (apignard) 2006-03-02 16:47:54.000-0600

Hello,

I have the same problem since some month but i don't find exactly where the problem is. Also i can't remove our cdr since we have so many call a day at billing purpose. Crash happen 3 to 4 time a day.

I have enable new feature of cdr bath and today got only a crash.
I have our cdr (mysql modified) on many other servers that's serve around 50K call a day each and they never crash. The sole difference is realtime + realtime queue.

Waiting the next crash to do as russel explain for get dump.

By: Olle Johansson (oej) 2006-03-03 01:53:10.000-0600

At some point I guess realtime DB access needs to be executed in a different thread than the calls.

By: Kenneth Holm (saitech) 2006-03-03 03:15:53.000-0600

In my opinion, database access, should the processed in a new thread, all alone, and a database cache system should be active in this thread. It's crucial that this cache is a spool file, so if asterisk crashes, the cdr data is still in the spool file, and on asterisk restart, it should process this spool file if possible.

This feature, will make the cdr billing system more reliabil, and also prevent these sip errors, where its waiting for a thread to unlock, before processing itself.

Also a database crash or minor connection problem, would not result in loss of billing data.

This is a highly wanted feature in my opinion.

By: apignard (apignard) 2006-03-07 17:27:48.000-0600

just more feedback
since batch=yes i have no more crash since last week.
before i have from 3 to 10 crash a day (average 4)
usage : realtime sip & iax / mysql cdr

By: Olle Johansson (oej) 2006-04-05 15:57:16

Does these problems still exist?

By: apignard (apignard) 2006-04-05 18:33:02

Hello,

Yes. I find maybe a way to easy reproduce the problem. On the cdr part, you lock the table where the insert is done and you will see the problem very rapidelly on a live server with a few call.

I will try to make a test with 1 call to see if it's happen again.

Regards,

By: Serge Vecher (serge-v) 2006-05-04 10:06:29

apignard: how did that test go. Still an issue in latest 1.2?

By: Serge Vecher (serge-v) 2006-05-12 10:59:49

Ok, if anybody is still able to reproduce this issue with the latest 1.2 branch (rev > 26000), please reopen the bug with updated debugging information.

thank you.

By: Tilghman Lesher (tilghman) 2006-12-20 07:36:25.000-0600

Opened on request by eskes.

By: Mattias Andersson (eskes) 2006-12-21 17:30:05.000-0600

The same bug a did com in my server.

It was asked if anyone could reproduce it in 1.2 should the bug be
reopened. I am running Trixbox 1.2.3 and ver 1.2.12.1 of Asterisk.
Unfortunately do I not have time to investigate it more before next year.

//Mattias (eskes)

By: Serge Vecher (serge-v) 2006-12-26 16:12:08.000-0600

when you are able to work on this again, please produce debugging information as requested. See note 0039261 for guidance.

By: Tilghman Lesher (tilghman) 2007-02-05 15:59:21.000-0600

eskes:  when you are able to work on this again, please ask a bug marshal to reopen this bug.