[Home]

Summary:ASTERISK-06074: asterisk crashing in ast_cdr_alloc on AMD64 platform.
Reporter:nywiley (nywiley)Labels:
Date Opened:2006-01-13 13:55:19.000-0600Date Closed:2006-01-27 17:03:07.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) ast_expr2.c
( 1) ast_expr2f.c
( 2) extensions.conf
Description:Hi,

  I am getting repeated crashes in the macro hangupcall when using asterisk.  It is on a AMD64 platform with 4Gig of memory, and tons of available disk space.  In no way is the system out of memory when this application core dumps but it always happening in a call to ast_cdr_alloc.

  I am running on Gentoo Linux, with a 2.6.14 kernel, and AMD64 SMP architecture.

   I would be happy to provide any information needed to help debug this.

- Best Regards,
Bill

****** ADDITIONAL INFORMATION ******

#0  0x00002aaaab3ef0ca in free () from /lib/libc.so.6
#1  0x00002aaaab3f066d in malloc () from /lib/libc.so.6
#2  0x00000000004672cd in ast_cdr_alloc () at cdr.c:459
#3  0x00000000004660e0 in ast_cdr_dup (cdr=0x17eca40) at cdr.c:171
#4  0x0000000000468d0a in ast_cdr_reset (cdr=0x17eca40, _flags=0x2aaab5480870)
   at cdr.c:837
ASTERISK-1  0x0000000000452ae9 in pbx_builtin_resetcdr (chan=0x2aaab4c37da0,
   data=0x2aaab5484b40) at pbx.c:5469
ASTERISK-2  0x00000000004447c9 in pbx_exec (c=0x2aaab4c37da0, app=0x6180f0,
   data=0x2aaab5484b40, newstack=1) at pbx.c:544
ASTERISK-3  0x00000000004487dc in pbx_extension_helper (c=0x2aaab4c37da0, con=0x0,
   context=0x2aaab4c37f78 "macro-hangupcall", exten=0x2aaab4c3806c "s",
   priority=1, label=0x0, callerid=0x2aaab4c10690 "7078236331", action=1)
   at pbx.c:1687
ASTERISK-4  0x0000000000449d7c in ast_spawn_extension (c=0x2aaab4c37da0,
   context=0x2aaab4c37f78 "macro-hangupcall", exten=0x2aaab4c3806c "s",
   priority=1, callerid=0x2aaab4c10690 "7078236331") at pbx.c:2220
ASTERISK-5  0x00002aaaaf0b5e3c in macro_exec (chan=0x2aaab4c37da0, data=0x2aaab548b8c0)
   at app_macro.c:210
ASTERISK-6 0x00000000004447c9 in pbx_exec (c=0x2aaab4c37da0, app=0x759570,
   data=0x2aaab548b8c0, newstack=1) at pbx.c:544
ASTERISK-7 0x00000000004487dc in pbx_extension_helper (c=0x2aaab4c37da0, con=0x0,
   context=0x2aaab4c37f78 "macro-hangupcall", exten=0x2aaab4c3806c "s",
   priority=1, label=0x0, callerid=0x2aaab4c10690 "7078236331", action=1)
   at pbx.c:1687
ASTERISK-8 0x0000000000449d7c in ast_spawn_extension (c=0x2aaab4c37da0,
   context=0x2aaab4c37f78 "macro-hangupcall", exten=0x2aaab4c3806c "s",
   priority=1, callerid=0x2aaab4c10690 "7078236331") at pbx.c:2220
ASTERISK-9 0x000000000044adde in __ast_pbx_run (c=0x2aaab4c37da0) at pbx.c:2441
ASTERISK-10 0x000000000044b0d4 in pbx_thread (data=0x2aaab4c37da0) at pbx.c:2507
ASTERISK-11 0x00002aaaaaccac09 in pthread_start_thread () from /lib/libpthread.so.0
ASTERISK-12 0x00002aaaab43d843 in clone () from /lib/libc.so.6
ASTERISK-13 0x0000000000000000 in ?? ()
Comments:By: Russell Bryant (russell) 2006-01-13 13:57:44.000-0600

Please try to get another backtrace with Asteirsk built with 'make dont-optimize'.

By: nywiley (nywiley) 2006-01-13 14:53:01.000-0600

Hi,

   Have recompiled the source with dont-optimize ... will provide a dump as soon as it crashes.

By: nywiley (nywiley) 2006-01-14 12:46:33.000-0600

New crash info after turning on dont-optimize

#0  0x00002aaaab3ee33f in malloc_usable_size () from /lib/libc.so.6
#1  0x00002aaaab3ee8ea in free () from /lib/libc.so.6
#2  0x00002aaaaeb9be40 in hangupcalls (outgoing=0x0, exception=0x0)
   at app_queue.c:1243
#3  0x00002aaaaeb9fe52 in try_calling (qe=0x2aaab4c68210,
   options=0x2aaab4c683e6 "", announceoverride=0x2aaab4c683e8 "",
   url=0x2aaab4c683e7 "", go_on=0x2aaab4c683a0) at app_queue.c:2293
#4  0x00002aaaaeba2f47 in queue_exec (chan=0x2aaab0629530, data=0x2aaab4c6c8c0)
   at app_queue.c:2987
ASTERISK-1  0x00000000004447c9 in pbx_exec (c=0x2aaab0629530, app=0x723c70,
   data=0x2aaab4c6c8c0, newstack=1) at pbx.c:544
ASTERISK-2  0x00000000004487dc in pbx_extension_helper (c=0x2aaab0629530, con=0x0,
   context=0x2aaab0629708 "ext-queues", exten=0x2aaab06297fc "2900",
   priority=5, label=0x0, callerid=0xb28e00 "6308980208", action=1)
   at pbx.c:1687
ASTERISK-3  0x0000000000449d7c in ast_spawn_extension (c=0x2aaab0629530,
   context=0x2aaab0629708 "ext-queues", exten=0x2aaab06297fc "2900",
   priority=5, callerid=0xb28e00 "6308980208") at pbx.c:2220
ASTERISK-4  0x000000000044a2a0 in __ast_pbx_run (c=0x2aaab0629530) at pbx.c:2286
ASTERISK-5  0x000000000044b0d4 in pbx_thread (data=0x2aaab0629530) at pbx.c:2507
ASTERISK-6 0x00002aaaaaccac09 in pthread_start_thread () from /lib/libpthread.so.0
ASTERISK-7 0x00002aaaab43d843 in clone () from /lib/libc.so.6

By: nywiley (nywiley) 2006-01-16 08:25:17.000-0600

It crashes when trying to free(oo) on line 1243.  It seems that it is the
last item in the linked list.

(gdb) p *oo
$4 = {chan = 0x0,
 interface = "Local/5199@from-internal", '\0' <repeats 231 times>,
 stillgoing = 0, metric = 0, oldstatus = 2, lastcall = 1137266972,
 member = 0xb1ad30, next = 0x0}

By: nywiley (nywiley) 2006-01-17 14:35:53.000-0600

Perhaps this is related to this bug - ASTERISK-6111

By: Matt O'Gorman (mogorman) 2006-01-17 14:37:02.000-0600

can you try the patch and tell us if it solves the issue

By: nywiley (nywiley) 2006-01-17 14:42:26.000-0600

Perhaps this is related to this bug - ASTERISK-6111

By: nywiley (nywiley) 2006-01-17 19:34:56.000-0600

Have applied patch and recompiled with dont-optimize.  Will monitor and let you know if it crashes again.   Also ... should I apply the memory leak fixes outlined in bug report 6072?

By: nywiley (nywiley) 2006-01-22 08:25:27.000-0600

Have applied the patch, and did get another crash, but I believe it is due to a memory leak!   I am seeing asterisk sucking memory down at 4K a pop.  So .. the system runs until the memory is exhausted and then crashes.  Is there any memory debugging code in the base source I can turn on to track this down, or will I have to go and put in a bunch of code to watch the allocation and freeing of memory per module to determine where the leak is?  I had asked in my last note if I should apply the patch associated to bug ASTERISK-5914

Best Regards,
Bill

By: Tilghman Lesher (tilghman) 2006-01-22 09:14:49.000-0600

Many memory leaks can be tracked down by using valgrind.  There is a valgrind suppression file included with Asterisk in the contrib/ directory.

By: Russell Bryant (russell) 2006-01-22 09:19:10.000-0600

Another thing that you can do is enable memory allocation debugging, which will allow you to find out which file has the problem.

Edit the main Makefile and find the "MALLOC_DEBUG" line.  Uncomment the value by removing the '#'.

Once you rebuilt Asterisk with this enabled, you will then have the following CLI commands:

  *CLI> show memory allocations [filename]
  *CLI> show memory summary [filename]

By: nywiley (nywiley) 2006-01-22 10:53:02.000-0600

Have recompiled the code with the DEBUG_MALLOC uncommented and will begin to watch the memory summary command to see if I can isolate the memory leak.

By: nywiley (nywiley) 2006-01-22 13:38:54.000-0600

I've upgraded to 1.2.2 and put the two patches related to substring in pbx.c into the 1.2.2 version.   I've compiled it with the memory debugging routines and dont-optimize.  Will post any memory leaks I find over the next day.

Best Regards,
Bill

By: nywiley (nywiley) 2006-01-23 07:55:03.000-0600

When I started the server - my memory allocation summary looked like this:

  15168 bytes in    12 allocations in file 'app_queue.c'
   108480 bytes in    25 allocations in file 'chan_zap.c'
     3968 bytes in    16 allocations in file 'file.c'
    35646 bytes in    33 allocations in file 'app_voicemail.c'
     1206 bytes in    36 allocations in file 'chanvars.c'
      360 bytes in     3 allocations in file 'cdr.c'
       24 bytes in     1 allocations in file 'devicestate.c'
    16456 bytes in     8 allocations in file 'io.c'
      176 bytes in    11 allocations in file 'ast_expr2.y'
     1250 bytes in    15 allocations in file 'config.c'
     6480 bytes in   167 allocations in file 'asterisk.c'
    40320 bytes in   126 allocations in file 'loader.c'
     7792 bytes in   138 allocations in file 'sched.c'
   137096 bytes in    85 allocations in file 'chan_sip.c'
     2768 bytes in    40 allocations in file 'manager.c'
     5302 bytes in   261 allocations in file 'res_indications.c'
    16353 bytes in   998 allocations in file 'pbx_config.c'
   135334 bytes in  1344 allocations in file 'pbx.c'
    20262 bytes in   403 allocations in file 'logger.c'
      550 bytes in    44 allocations in file 'ast_expr2.fl'

Now that the server has been running a while ... it looks like this:
 11072 bytes in     1 allocations in file 'localtime.c'
        1 bytes in     1 allocations in file 'res_features.c'
     3008 bytes in     4 allocations in file 'dsp.c'
    64768 bytes in     4 allocations in file 'frame.c'
      118 bytes in     6 allocations in file 'app_dial.c'
     1328 bytes in     2 allocations in file 'format_wav.c'
     1176 bytes in     3 allocations in file 'res_crypto.c'
   132464 bytes in     4 allocations in file 'res_musiconhold.c'
     1040 bytes in     2 allocations in file 'enum.c'
      680 bytes in     1 allocations in file 'chan_agent.c'
       20 bytes in     1 allocations in file 'cli.c'
    42544 bytes in     8 allocations in file 'rtp.c'
   108500 bytes in    29 allocations in file 'chan_zap.c'
     3968 bytes in    16 allocations in file 'file.c'
    35646 bytes in    33 allocations in file 'app_voicemail.c'
     5560 bytes in     8 allocations in file 'cdr.c'
    17984 bytes in    34 allocations in file 'app_queue.c'
       24 bytes in     1 allocations in file 'devicestate.c'
    10214 bytes in    27 allocations in file 'channel.c'
    16456 bytes in     8 allocations in file 'io.c'
      252 bytes in    28 allocations in file 'app_macro.c'
     1250 bytes in    15 allocations in file 'config.c'
     6480 bytes in   167 allocations in file 'asterisk.c'
    40320 bytes in   126 allocations in file 'loader.c'
     8368 bytes in   146 allocations in file 'sched.c'
    19679 bytes in   403 allocations in file 'logger.c'
   183768 bytes in    97 allocations in file 'chan_sip.c'
     5886 bytes in   155 allocations in file 'chanvars.c'
     2768 bytes in    40 allocations in file 'manager.c'
     5302 bytes in   261 allocations in file 'res_indications.c'
    16353 bytes in   998 allocations in file 'pbx_config.c'
   135366 bytes in  1348 allocations in file 'pbx.c'
   122520 bytes in  9759 allocations in file 'ast_expr2.fl'
    39056 bytes in  2441 allocations in file 'ast_expr2.y'

The number of allocations associated to ast_expr2 seem high, and I don't see them ever coming down .. just growing.  My guess is that the bug 6072 has something to do with this.

By: Russell Bryant (russell) 2006-01-23 08:09:50.000-0600

would you mind applying the patch from ASTERISK-5914 and seeing if that solves your problem?

By: nywiley (nywiley) 2006-01-23 09:25:24.000-0600

Hi,

  I wasn't sure if they had totally resolved 6072 as it appears that they were having some issues over the changing of some of the syntax.  I have applied the third file named 20060116__bug6072.diff.txt to my ast_expr2.y and ast_expr2.fl files and will install the new binaries tonight after everyone has gone home.  Will post my results tomorrow.

Best Regards,
Bill

By: Tilghman Lesher (tilghman) 2006-01-23 09:48:06.000-0600

That patch has already been applied to SVN 1.2.  You'd do better to upgrade to the latest 1.2 branch from SVN and run that.

By: nywiley (nywiley) 2006-01-23 10:08:54.000-0600

I just upgraded to 1.2.2 ... why aren't the current fixes applied to the newest release?  I've tried to stay away from using SVN releases as I was concerned about the stability of the releases.  Are you telling me that the svn's are more stable than the standard releases?

Best Regards,
Bill

By: Tilghman Lesher (tilghman) 2006-01-23 10:13:31.000-0600

No, I'm telling you that 1.2 SVN has the latest fixes.  The releases only get the patches which are ready.  That change to the expression parser was not yet ready when 1.2.2 was released, and so consequently, it was not added to our release branch.

Please differentiate between SVN trunk and SVN 1.2, which is the release branch which we snapshot from time to time as a release.  SVN trunk is our development branch.  I do NOT recommend that you run SVN trunk.

By: nywiley (nywiley) 2006-01-23 10:50:56.000-0600

Hi,

   OK ... it's clear to me now ... you are keeping the fixed versions under SVN instead of going back and repairing the release tars that everyone downloads.   So, if I want to keep my site up to date with the latest fixes I am better off using svn.

   I will let you know how the patch worked on the 1.2.2 as I have already got that ready to go.  I will set up svn on my box and get it ready to download the distribution that way for the future.

Best Regards,
Bill

By: nywiley (nywiley) 2006-01-24 09:20:00.000-0600

After running the system for several hours today, it's clear that the ast_expr2.y and ast_expr2.fl are still leaking memory, despite the patch.
I will compare the files to the svn versions I downloaded last night and see if there are any differences.

Best Regards,
Bill

By: nywiley (nywiley) 2006-01-25 09:46:14.000-0600

Update:  Put in the svn version last night.   I entirely lost the ability to hear any calls on any SIP phone! Any calls, whether originating in house, or externally over a ZAP line had no sound whatsoever!   I have had to revert back to the previous version to get us operational.  So, I am back on 1.2.2 with patches to pbx.c and ast_expr2.y and ast_expr2.fl.  I can't see any difference between the pbx.c and ast_expr2.y and ast_expr2.fl files and the svn versions ... I checked them with diff.  But, they are still leaking memory.  I am also seeing error messages in the log that ast_expr2.c is trying to free nil memory at line 2533 which points to a function ast_yyfree which doesn't check for nil pointers.

By: nywiley (nywiley) 2006-01-25 12:01:00.000-0600

OK ... apparently the svn was broken last night ... DOH! ... Resynced it this morning ... and its working again.  I also see that they made a new release 1.2.3.   In both the 1.2.3 and svn there is a bug in the ast_expr2f.c file in the function ast_yylex_destroy.  The second call to ast_yyfree to free the yyg->yy_start_stack is passing a null pointer.  I put in a quick patch to check the pointer to see if was null before making the call.  Will check and see if their is still a memory leak in the ast_expr2 routines after running it for a day.

Best Regards,
Bill

By: nywiley (nywiley) 2006-01-26 10:01:07.000-0600

Running the server this AM I see that the ast_expr2 area is still leaking memory.  I will post this in association to bug 6072 as well.

        1 bytes in     1 allocations in file 'res_features.c'
      680 bytes in     1 allocations in file 'chan_agent.c'
       16 bytes in     1 allocations in file 'res_agi.c'
       24 bytes in     1 allocations in file 'devicestate.c'
    11072 bytes in     1 allocations in file 'localtime.c'
     5264 bytes in     7 allocations in file 'dsp.c'
       20 bytes in     1 allocations in file 'cli.c'
      164 bytes in     9 allocations in file 'app_dial.c'
     1176 bytes in     3 allocations in file 'res_crypto.c'
     3968 bytes in    16 allocations in file 'file.c'
    16456 bytes in     8 allocations in file 'io.c'
    80960 bytes in     5 allocations in file 'frame.c'
   132488 bytes in     5 allocations in file 'res_musiconhold.c'
     1250 bytes in    15 allocations in file 'config.c'
    53180 bytes in    10 allocations in file 'rtp.c'
     1992 bytes in     3 allocations in file 'format_wav.c'
     1040 bytes in     2 allocations in file 'enum.c'
    21372 bytes in    55 allocations in file 'app_queue.c'
   108515 bytes in    32 allocations in file 'chan_zap.c'
     2007 bytes in    28 allocations in file 'chan_local.c'
     2768 bytes in    40 allocations in file 'manager.c'
   183868 bytes in    99 allocations in file 'chan_sip.c'
    10760 bytes in    13 allocations in file 'cdr.c'
    25314 bytes in    49 allocations in file 'channel.c'
    21966 bytes in   403 allocations in file 'logger.c'
     8840 bytes in   151 allocations in file 'sched.c'
      474 bytes in    48 allocations in file 'app_macro.c'
     8544 bytes in   227 allocations in file 'chanvars.c'
    36750 bytes in    34 allocations in file 'app_voicemail.c'
     6480 bytes in   167 allocations in file 'asterisk.c'
    40320 bytes in   126 allocations in file 'loader.c'
     5302 bytes in   261 allocations in file 'res_indications.c'
    16361 bytes in   999 allocations in file 'pbx_config.c'
   135555 bytes in  1359 allocations in file 'pbx.c'
   150656 bytes in  9416 allocations in file 'ast_expr2.y'
   472416 bytes in 37624 allocations in file 'ast_expr2.fl'

Best Regards,
Bill

By: Steve Murphy (murf) 2006-01-26 10:39:51.000-0600

What exactly are the numbers saying here?  Are they expressing the total amount of allocated memory, never freed, or are they just counting the allocated memory, and not subtracting the freed memory?

You see, every time a $[...] expression is evaluated, the parser is called, and a series of memory allocations (calloc, etc) takes place. The memory is freed before the parser exits. So, to bump the allocation is a normal process. The fact that the numbers rise so quickly is testimony that you make heavy use of $[...] expressions! Or that you have one very busy daemon, lots of incoming/outgoing calls.

If whatever is counting the memory allocations, also is decrementing the totals with the free() calls, then we need to investigate! I have been thinking about how to attack this, and now I have an idea... send me your dialplan (extensions.conf), I extract all your $[...] expressions, set up a test to run them millions of times, and study the allocation totals. Purify is good at finding the leaked memory when the pointers disappear, but doesn't tell you about leaks where the pointers still live... So, if indeed there are still mem leaks, I need to find where/why the pointers are preserved.

One other source of leaks is/are syntax errors from the $[...] parser itself. Are you getting any error messages in your logs? You'd have to be getting a ton of them!

Check around, and let me know. My email is murf at e-tools [dot] com.

By: nywiley (nywiley) 2006-01-26 10:51:13.000-0600

Hi Murf,

   I sent you the config file which is basically a modified AMP version.  Let me know how I can be of further assistance to you.

Best Regards,
Bill

By: nywiley (nywiley) 2006-01-26 11:01:10.000-0600

Hi,

   Tried to email you the config file from hotmail, but it got bounced back as undeliverable.  Can you give me another e-mail address to send it to?  I've also uploaded it to this bug for you as well.

Best Regards,
Bill

By: Steve Murphy (murf) 2006-01-26 17:00:29.000-0600

OK, I think I have the solution!

Corydon76 (tilghman)-- The situation is like this: There are 4 files involved in the 6072 patch, not just 2. They are: ast_expr2.y, ast_expr2.fl, ast_expr2.c, and ast_expr2f.c. They are all 4 in svn, and all 4 must be treated as a unit. The last 2 are the results of bison and flex, using late-model bison/flex, for those who don't have them. The makefile doesn't automatically rebuild the .c files, when newer .y/.fl files are there, because there just plain may not be a bison/flex of sufficiently advanced version to correctly process them.

The problem is that the .y/.fl files have been applied to the SVN, but the other two have not been changed. So, you get the exact same behavior as without any updates being done at all. Unless the end users physically remove the .c files and then do a make. IF they have sufficiently late-model versions of bison/flex.

So, there you have it. I was puzzled why the memory allocations were so screwed up... now I know. Sorry I didn't spot this earlier for you guys!!! I was going off my 6072 bug fix release. I didn't realize the updates hadn't been done for all 4. ... sorry again.

By: Tilghman Lesher (tilghman) 2006-01-26 17:38:27.000-0600

murf:  what version of flex do you need to generate these files?  I have 2.5.4a, which is what GNU lists as the current version, but it is not able to generate these files.

By: nywiley (nywiley) 2006-01-26 18:05:41.000-0600

I'm using 2.5.4a also, and although I changed the Makefile to use -f instead of --full, it complained about a number of unknown options in the fl file.  I commented those options out, but after generating the file, it was not able to compile it do to missing definitions for yyscan_t and yylloc_param.  Can I get copies of the correctly generated files to see if the memory problem goes away?

Thanks,
Bill

By: Steve Murphy (murf) 2006-01-26 20:15:13.000-0600

Gentlemen--

2.5.31 is the minimum. (They don't have anything newer than that, anyway.) And nothing previous to that will build a pure scanner.

The ast_expr2f.c I submitted is built via 2.5.31. I suggest using it.

THe ast_expr2.c may well be OK with 875 version of bison. But the one I submitted is built by bison 2.0. Your call. If it works, it's OK. The version I submitted was tested with purify by myself. If you want to use the 1.8xx version, you should thoroughly test it.       Your power == your responsibility, right?

Let me put it a different way: if you want to change what I submitted, it's fine, just be sure it really, truely, still works.

By: Steve Murphy (murf) 2006-01-26 20:20:38.000-0600

OOPS! One more thing!!!!

I DID patch the flex 2.5.31 for the AEL2 stuff, so it would generate a compilable file. Hmmm. Let's see. I'll upload the .c files, because the 6072 file that were actually applied to svn were not exactly what I submitted. I'll upload the two .c files here in a minute.

By: Steve Murphy (murf) 2006-01-26 20:31:00.000-0600

OK, built the .c files from the 1.2 stock release. Looks good with purify.
Plug these into your 1.2 release SVN stuff. What's there is garbage.
Best of Luck, all!

By: nywiley (nywiley) 2006-01-27 07:55:10.000-0600

OK ... I can confirm that the new .c files have corrected the memory leak.  However, I don't know if requiring 2.5.31 is a good idea, as all of the current distributions of linux are running 2.5.4a.  Plus, I understand that 2.5.31 has some issues with some standard packages (postgres, sim, and mico).
I think it would be best to either make a version that would run on 2.5.4a, or just remove the .fl from the distribution and distribute the .c instead.
Also ... I know that there are memory leak issues with the current ast_expr being distributed under 1.10 as well.

Thanks for helping resolve this issue,
Best Regards,
Bill

By: Russell Bryant (russell) 2006-01-27 13:15:51.000-0600

It looks like these files have already been merged.  Is there anything left that needs to be done?

By: Tilghman Lesher (tilghman) 2006-01-27 17:03:07.000-0600

Nope, we'll close this as fixed.  If this is still a problem, please reopen.