[Home]

Summary:ASTERISK-04829: [patch] cdr_addon_mysql - Lost cdr records, hanging server, spooling sql statements
Reporter:Joseph Benden (jbenden)Labels:
Date Opened:2005-08-13 00:09:01Date Closed:2005-09-25 16:46:50
Priority:MajorRegression?No
Status:Closed/CompleteComponents:CDR/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) cdr_addon_mysql.c.patch-3.txt
( 1) cdr_addon_mysql.c.patch-4.txt
( 2) cdr_addon_mysql.c.patch-5.txt
( 3) cdr_addon_mysql.c.patch-5a.txt
( 4) cdr_addon_mysql.c.patch-5b.txt
( 5) cdr_addon_mysql.c.patch-6.txt
( 6) cdr_mysql_verify.pl
Description:The cdr_addon_mysql can and will hang indefinitely whilst waiting for the mysql server to connect and it's not present.  The module also can lose a single cdr record upon a lost server connection.  The module also looses all cdr records upon a connection being completely lost.

This patch fixes all of the above by adding a connect_timeout parameter, a reconnect goto, and a spool file.

Documentation about this entire module is at:
http://www.uglyboxindustries.com/cdr_addon_mysql.html


****** ADDITIONAL INFORMATION ******

Disclaimer will be snail mailed.
Comments:By: Tilghman Lesher (tilghman) 2005-08-14 10:08:20

- Might want to change the 2013 to the constant CR_SERVER_LOST.
- You might want to fix the formatting as documented in doc/CODING-GUIDELINES, such as putting the else braces all on one line, and separating arguments to functions with spaces.
- In one reformat, you've removed a block, and therefore there's a variable declaration in the middle of code, rather than at the beginning.  This is a definite violation of the coding guidelines and needs to be fixed.  (Near line 235.)

Otherwise, looks great.



By: Tilghman Lesher (tilghman) 2005-08-14 10:12:33

MikeJ- Since cdr_addon_mysql is outside of the main tree, this can be fixed independent of the 1.2 release.

By: Joseph Benden (jbenden) 2005-08-14 11:45:53

Ok, I've uploaded a new patch with the reqested changes.

Is there a way to #ifdef the 1.0.X stable branch verses the CVS_HEAD?  If so, this patch could easily #ifdef the few changes needed to patch CVS_HEAD.

By: Tilghman Lesher (tilghman) 2005-08-14 12:13:13

If you #include <asterisk/version.h>, then ASTERISK_VERSION_NUM is defined:

#if (ASTERISK_VERSION_NUM > 10199)
/* 1.2.x and CVS HEAD code */
#else
/* 1.0.x code */
#endif

By: Joseph Benden (jbenden) 2005-08-14 12:30:46

I've made the changes, but I'm a little concerned because I have no version.h file in my include folder.  If I do a cvs update -f version.h in include/asterisk, it doesn't pull it down either.  Any ideas?

By: Tilghman Lesher (tilghman) 2005-08-15 13:47:27

Are you running on 1.0.x or on CVS HEAD?  If on 1.0.x, it doesn't really matter, because if ASTERISK_VERSION_NUM is not defined, it defaults to a value of 0, which of course is not greater than 10199.

By: Joseph Benden (jbenden) 2005-08-15 13:49:50

I am running 1.0.9.  Should I just remove the #include <asterisk/version.h> ?  Will that define be automatically created via the asterisk.h include?  If I try to build the module currently, the asterisk/version.h file doesn't exist, so the build fails.

By: Joseph Benden (jbenden) 2005-08-15 13:50:14

Also, I should note that I faxed my disclaimer this morning.

By: Tilghman Lesher (tilghman) 2005-08-15 13:55:39

Either way will work.  If you leave in the #include, then you'll get a warning on 1.0 at compile time, but if you remove it, you'll get a warning on 1.2 at runtime.  I'd say leave it in.

By: drmac (drmac) 2005-08-18 16:13:09

Is it possible the spooling portion of this patch is redundant with the recent addition of the threaded CDR engine?

By: drmac (drmac) 2005-08-18 16:20:32

This line is unnecessary as mysql_ping already returns the error code upon failure to ping.

+ error = mysql_errno(&mysql);

By: Kevin P. Fleming (kpfleming) 2005-08-23 11:48:50

I'd like to understand what this offers that the threaded/spooled CDR posting already in CVS HEAD cannot offer.

By: Joseph Benden (jbenden) 2005-08-23 12:23:24

I haven't looked at the new threaded/spooled cdr handler, so I do not know the full changes.  What I can say is that if the new cdr handler calls this code, records can and will be lost - and connections can hang indefinitly waiting on connecting to a broken mysql server.

I also made these changes for STABLE, as this was my goal.  It was unacceptable for me to lose CDRs and to have the entire process hang, during a network twitch.

I hope this helps in determining if it's needed....

By: drmac (drmac) 2005-08-23 12:42:28

here are a couple of positives that I like about this patch:

 * better error messages
 * connection timeout
 * ability to specifiy table to use

things i think can be removed from patch:

 * spooling
 * goto db_reconnect (a reconnect happens each time a cdr is needed to be posted)
 * the mysql_escape_string's need to be reverted to mysql_real_escape_strings because control should never reach this area if you are not connected.

By: drmac (drmac) 2005-08-23 12:44:59

how much are we really saving by haveing LOG_UNIQUEID as a compiler directive? cant we remove that and make that a *.conf attribute instead?

By: Joseph Benden (jbenden) 2005-08-23 12:53:41

Ok, can we do this?

a. This patch is needed for 1.x stable as it brings the best of all worlds to the code.

b. Tell me what to yank (tag) from CVS and I will make these needed changes for forward movement in Asterisk, which include the last post.  Such items as removing the spooling, changing the mysql_real_escape, removing the loop to reconnect.  Then if I find a bug or a code path that isn't handled, I'll come up with the idea/fix for it.

This would then help existing 1.x users and would also benefit the next releases of Asterisk....

Thoughts or ideas?

By: Kevin P. Fleming (kpfleming) 2005-08-23 13:27:15

For 1.0.x, there can be no new features unless they are clearly needed to solve a problem. The connect timeout would probably fall into this category. Spooling and related stuff would not, especially since it's been implemented in the CDR core already in CVS HEAD.

For CVS HEAD, we are already in feature freeze preparing for the 1.2 release, so new features are in pretty much the same situation as 1.0.x. If you have features to add that can't make the 1.2 release, we can mark this whole patch as post-1.2 and leave it for then. However, if there are actual bugs being fixed here, I'd rather get them separately posted so they can be merged.

By: Tilghman Lesher (tilghman) 2005-08-23 13:35:51

Feature freeze is irrelevant for this patch, however, since it (cdr_mysql_addon) is maintained outside of the tree and is not considered to be part of either 1.0 or 1.2.



By: Kevin P. Fleming (kpfleming) 2005-08-23 13:40:10

Very true; I forgot this module was in asterisk-addons.

By: Joseph Benden (jbenden) 2005-08-23 14:03:21

Heh...

Ok, so when someone decides the fate of what needs to be done, I am willing to program it and submit new patches and whatever else needs done.

My original goal of not losing a bunch of CDRs that fall through and cost me money, are solved on my side.  I just want to ensure that noone else has this problem, and am willing to make whatever changes need done, as to further this project.

By: Tilghman Lesher (tilghman) 2005-08-23 14:34:43

I really think this can go in, as-is.

By: Kevin P. Fleming (kpfleming) 2005-08-24 22:27:44

Why should this go in with its own spooling implementation when we already have one in CVS HEAD?

By: Tilghman Lesher (tilghman) 2005-08-24 22:59:20

Because it's also beneficial for people running 1.0.  While it is duplicate functionality, it won't hurt anything to have the spool in both places, should the person using this module be using 1.2.

By: Joseph Benden (jbenden) 2005-08-24 23:43:01

I agree with Corydon76.

My point is this:  While the CVS HEAD is great, it's beta non-production code.  People with businesses use the STABLE branch and the businesses using this code WILL lose CDR records.  This means money to them.  This code patches that STABLE branch and corrects the problem.

If a separate version of this is needed for the soon STABLE 1.2, then fine.  When the STABLE 1.2 is released (or beforehand) a specific version can be made for it, without the spooling, etc.

I just don't understand why you wouldn't want to fix a major problem for STABLE users.  While it doesn't seem as a major problem to the developers, it's classified as a major problem to business users.  A segment of missing CDR records can add up quickly, causing unbilled customer usage, whilst the business is billed by their provider.

Is it always this hard to submit patches to problems? I've spent more time trying to get this submitted to the CVS than actually fixing the original problem with the code.  Maybe I should cease to submit patches...

By: Kevin P. Fleming (kpfleming) 2005-08-24 23:57:58

If we go down that road, we could end up wanting to backport every fix from HEAD into the release branches, since they all fix 'real' problems.

I can tell you that it's highly unlikely that Mark would allow a fix to go into 1.0.x that had no chance of being put into HEAD, unless there was absolutely no other way to solve the problem and it was very critical for it to be fixed.

If you think that discussing the merits of a patch and its implementation is being 'hard', then I'd have to disagree. Put simply, this 'spooling' addition nearly can't go into 1.0.x at all, since it could be considered a new feature. It certainly not something that users are going to expect to see changing in a supposedly stable release series. I have no problem with the other changes you've made that fix the obvious failures in the module, but adding a spool for CDRs outside of the Asterisk core just seems completely wrong.

Are you going to absolutely guarantee to the users of this code that they will never lose a CDR record if they are using this spool file? I don't think you can do that, even the spooling in CVS HEAD is prone to losing records if the circumstances are right. Given that, it adds another thing for users of the 1.0.x release branch to worry about changing between 1.0.9 and 1.0.10 (or whenever it makes it in), as opposed to the usual bug fixes (which they don't really have to pay any attention to). Even though

Finally, there is also another method to solve this problem: use the cdr_csv module in parallel, so that there is a duplicate copy of the data. Should a database connection failure occur and records not be posted there, they can be recovered from the CSV file. I'm not saying that it's a better solution, but I can tell you that if I proposed merging this patch to Mark, that's the first solution he'd offer as an alternative.

(And even though asterisk-addons is not part of Asterisk and is not subject to the feature freeze for HEAD, there are still HEAD and v1-0 versions of it, and it is versioned and packaged along with Asterisk itself, so users of 1.0.x are going to expect the next 'release' of asterisk-addons to follow the same rules as the main Asterisk release, unless I'm misunderstanding :-))

By: Joseph Benden (jbenden) 2005-08-25 00:09:14

Ok, I give up.  It's fixed in my version of STABLE 1.0.9 that I use across my five boxes.  I maintain the code and pull appropriate changes from CVS.  I've tested this patch fairly well by network cable unplugging, shutting down the database, and firewalling the database server - in all cases I did not lose any records and the connect timeout parameter saved the box from locking all current calls.

If you want the patch or parts of it, fine.  If not, fine.

I cannot spend any more time trying to help, as I've got my own stuff that needs done.

If you decide what parts you want, and if you do not want to invest the time in yanking those parts from my patch, let me know, I'll do it...

By: Tilghman Lesher (tilghman) 2005-08-25 00:16:32

JBenden:  the reason why this is such a difficult decision to make is that the code exists independently of the main tree, so it's problematic in that it really isn't part of EITHER 1.0 or HEAD.

kpfleming:  this isn't a backport.  Because the tree is independent of 1.0 or 1.2, it needs to work for both, as much as that is possible.  Perhaps we should add preprocessor macros to check for the version, and exclude the spooling functionality if the module is being compiled for 1.2 or later, if the duplication of spooling functionality really bothers you.

By: drmac (drmac) 2005-08-25 09:14:01

IMHO, even if you add the macros to keep out the spooling from 1.2, adding spooling to STABLE would still be considered a new feature.

Also, IMHO, if you are loosing connections to your database server this might indicate a problem elsewhere in your system and may not infact be a fault of the module. My proof:

Connected to asterisk@222.222.222.222, port 3306 using table cdr for 7 days, 1 hours, 55 minutes, 25 seconds.
 Wrote 25846 records since last restart and 25029 records since last reconnect.

That's approx. 2.4 CDR's per minute and is in perfect alignment with my master.csv.

If you are writing records faster than that (which I'm sure you are), you might be overloading your database's internet buffer or something and its dropping the connection.

Just my $0.02

By: Tilghman Lesher (tilghman) 2005-08-25 10:56:37

drmac:  but that's just the point.  This code is not now and it never has been part of 1.0, and we're not proposing that it be added to that codebase.  It is an external module, and it will continue to be.  It is also not going to be part of 1.2.  Therefore, the "no new features in 1.0" policy does not apply to this code.

By: Tilghman Lesher (tilghman) 2005-08-25 10:58:33

To put it another way, cdr_addon_mysql is similar to codec_g729a.so, in that it is not part of the 1.0 codebase, and it is not part of the 1.2 codebase.  It is an addon, so these policy questions you're trying to apply are irrelevant to this module.

By: Joseph Benden (jbenden) 2005-08-25 11:12:48

So, yeah your mysql has logged a bunch of records, but the last time it reconnected, it lost ONE record whilst doing it... Follow the code path through, you'll see that it blanantly drops a record.

Now, my situation is this.  I've got a really low wait_timeout configured on my MySQL database server.  This is because it is doing 500 queries per second avg.  It also has around 10 boxes beating on it.

So, it doesn't stay connected for hours, it says connected for 600 seconds max.  Therefore, I was loosing a bunch of records quickly.... Plus, if that database server got real busy and hung a connection, the whole asterisk box would freeze up (calls in progress and everything) and hang......

By: Kevin P. Fleming (kpfleming) 2005-08-26 16:17:59

OK, some of this discussion has actually been incorrect: there _are_ v1-0 and HEAD branches of asterisk-addons, and asterisk-addons is tagged and released in tarball form along with asterisk itself.

In the absence of any statements to the contrary from Mark, I believe that means asterisk-addons needs to follow the same rules as asterisk itself regarding feature addition.

If someone wants to take this patch and extract the bugfix-only parts into a new patch, I'll be happy to apply it to v1-0 and HEAD ASAP. The spooling issue will continue to be a problem for v1-0 users, but HEAD (and v1-2) users can use the built-in spooling in the CDR engine.

By: Joseph Benden (jbenden) 2005-08-26 20:22:22

Sure!

So you would like everything in the patch, except the spooling?

(Just checking before I do anything)

By: Kevin P. Fleming (kpfleming) 2005-08-26 20:27:21

Yes, that would be fantastic.

By: Joseph Benden (jbenden) 2005-08-26 21:48:48

I've posted a patch where the spooling was removed.  Is it OK?  Or do you want some of the other "stuff", like mysql_real_escape back in place?

I would like to refuse to remove the goto reconnect stuff, because if it can simply reconnect, you won't lose a valid cdr record... This applies even if the spooling is performed in the new version of the cdr handler.  I really see more benefit in leaving that, then removing it.

By: drmac (drmac) 2005-08-27 11:06:03

The reconnect can stay, but put some sort of counter in there otherwise you could have a thread lock the module and then your system 'would' hang while waiting for this thread to finish reconnecting over and over and over and over. At that poing you will have to sacrifice loosing the CDR for sake of the server as a whole.

Hmm, in the case above where you loose a record for stability sake, you might be able to whip up a simple script that parses thru the CSV CDR's and makes sure they are in synch. Ran as a cron job nightly would be good.

By: Joseph Benden (jbenden) 2005-08-27 12:17:20

Ok, I've added a retry counter.

I've also whipped up a perl script (requires Text::CSV) to verify the entries in the Master.csv against a MySQL database server.

I like that idea, as this tool could also be used in cases where somebody wants to pull in old data or convert from csv to mysql.

(NOTE: I verified the functionality of that perl script, but that doesn't mean I didn't mess it up... I programmed it in 15 minutes..;)

By: drmac (drmac) 2005-08-27 12:51:08

cdr_addon_mysql.c.patch-5.txt needs to be reupload in "diff -u" format.

By: Joseph Benden (jbenden) 2005-08-27 13:00:03

Sorry about that... :)

By: drmac (drmac) 2005-08-27 14:19:54

i know im being extremely anal on this one, but you can combine the following two lines:

+      int retries;
+ retries = 5;

into

int retries = 5;

othe than that..the patch itself looks good to me. ill apply it and look it over again and post anything else i see.

By: Joseph Benden (jbenden) 2005-08-27 14:31:18

Ok, I made that change.

:)

By: Kevin P. Fleming (kpfleming) 2005-08-30 23:24:38

Please mark this bug as "feedback" for me when you two have decided that it is working as expected, and I'll get it into CVS HEAD and CVS v1-0.

By: Michael Jerris (mikej) 2005-09-02 19:23:11

Status request please.

By: Joseph Benden (jbenden) 2005-09-05 21:59:58

Anybody home?  This has been sitting here for quite a while.

Was I suppose to complete something else in order to get this patch submitted?

By: Michael Jerris (mikej) 2005-09-06 17:47:40

you were supposed to test and mark the bug in feedback mode when it is ready.  I will take by your comments that it is ready.

By: Kevin P. Fleming (kpfleming) 2005-09-07 17:54:15

Is this latest patch supposed to apply to CVS HEAD asterisk-addons as well? It doesn't, and there's no need for checking the Asterisk version number if it's for v1-0 only.

By: drmac (drmac) 2005-09-08 08:16:30

Against most recent CVS-HEAD:

[root@cytrex asterisk-addons]# patch -p0 <cdr_addon.patch.txt
patching file cdr_addon_mysql.c
Hunk #2 FAILED at 27.
Hunk #3 FAILED at 38.
Hunk #4 FAILED at 79.
Hunk ASTERISK-1 FAILED at 115.
Hunk ASTERISK-2 FAILED at 152.
Hunk ASTERISK-3 succeeded at 275 with fuzz 2 (offset 6 lines).
Hunk ASTERISK-4 FAILED at 297.
Hunk ASTERISK-5 FAILED at 313.
Hunk ASTERISK-6 FAILED at 481.
8 out of 10 hunks FAILED -- saving rejects to file cdr_addon_mysql.c.rej

By: Michael Jerris (mikej) 2005-09-14 00:12:05

jbenden, we need an updated patch for this for cvs head please.

By: Michael Jerris (mikej) 2005-09-24 18:42:54

Suspended due to no response.  Please update patch and re-open this bug when ready.  Thanks.

By: Joseph Benden (jbenden) 2005-09-24 18:44:38

I did post a new patch. A while back...  :)

By: Michael Jerris (mikej) 2005-09-24 18:47:49

Oops.. missed it due to the no comment.  Sorry.

By: Michael Jerris (mikej) 2005-09-24 18:48:14

Does that one still apply?

By: Joseph Benden (jbenden) 2005-09-24 19:04:55

I just tried it, and it does.

By: Kevin P. Fleming (kpfleming) 2005-09-25 16:44:33

Committed to CVS HEAD, thanks!