[Home]

Summary:ASTERISK-16742: sometimes crash after Unable to write to alert pipe
Reporter:Walter Doekes (wdoekes)Labels:
Date Opened:2010-09-28 03:02:48Date Closed:2011-02-08 13:16:38.000-0600
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/Channels
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) alert-pipe-crash-1.6.2.14rc1-btfull.txt
( 1) alert-pipe-crash-1.6.2.14rc1-info.txt
( 2) alert-pipe-crash-1.6.2.8-info.txt
Description:Hi,

I was updating my dialplan and running some tests calls, and suddenly asterisk started crashing. I'm guessing the dialplan change did something, but I can't figure out what yet.

I won't be posting the entire dialplan, and I have yet to distill it down to something small that still crashes.

I've cleaned asterisk of all my patches and had both 1.6.2.8 and 1.6.2.14-rc1 crash with MALLOC_DEBUG and DONT_OPTIMIZE on. (Oddly enough, it never core dumps, so I have to run it under gdb to get proper info.)

****** STEPS TO REPRODUCE ******

I can reproduce it with my current setup by calling the test number over and over again and after a couple (2-20) times, it crashes.

With my latest test with 1.6.2.8, I got this message first:


   -- Local/+31852100586@osvpi_world-f479;1 answered SIP/126680002-00000012
WARNING: Freeing unused memory at 0x122ec08, in __ast_module_user_remove of loader.c, line 222
[2010-09-28 09:37:01] WARNING[23556]: channel.c:1072 __ast_queue_frame: Unable to write to alert pipe on Local/+31507001918@world_out-9406;1<ZOMBIE> (qlen = 0): Bad file descriptor!
   -- Packet2Packet bridging SIP/126680002-00000012 and SIP/126680007-00000013

Then I called once more, and then I got:

   -- Executing [+31507001918@voipgrid_out:1] Dial("Local/+31507001918@world_out-6f41;2", "SIP/+31507001918@126680007") in new stack
 == Using SIP RTP CoS mark 5
 == Using UDPTL CoS mark 5
   -- Called +31507001918@126680007
   -- SIP/126680007-00000015 answered Local/+31507001918@world_out-6f41;2
   -- Local/+31507001918@world_out-6f41;1 answered Local/+31852100586@osvpi_world-f7c2;2
   -- Local/+31852100586@osvpi_world-f7c2;1 answered SIP/126680002-00000014
WARNING: Freeing unused memory at 0x123b178, in __ast_module_user_remove of loader.c, line 222
[2010-09-28 09:37:35] WARNING[23560]: channel.c:1072 __ast_queue_frame: Unable to write to alert pipe on Local/+31507001918@world_out-6f41;1<ZOMBIE> (qlen = 0): Bad file descriptor!
WARNING: Freeing unused memory at 0x11df438, in __ast_module_user_remove of loader.c, line 222
 == Spawn extension (osvpi_world_real_call, +31507001918, 3) exited non-zero on 'Local/+31852100586@osvpi_world-f7c2;2'
[Thread 0x40176950 (LWP 23560) exited]
 == Spawn extension (voipgrid_out, +31507001918, 1) exited non-zero on 'Local/+31507001918@world_out-6f1;2'
[Thread 0x41401950 (LWP 23562) exited]
WARNING: Freeing unused memory at 0x123d098, in local_pvt_destroy of chan_local.c, line 171

Program received signal SIGSEGV, Segmentation fault.

****** ADDITIONAL INFORMATION ******

I've attached three files:

1.6.2.14-rc1:
- alert-pipe-crash-1.6.2.14rc1-info.txt = a bit of info
- alert-pipe-crash-1.6.2.14rc1-btfull.txt = thread apply all bt full

1.6.2.8:
- alert-pipe-crash-1.6.2.8-info.txt = a bit more info


The crash happens on:

       ast_string_field_set(clonechan, name, zombn);

in ast_do_masquerade. Here clonechan->name is 0x25, which is a bad address obviously. clonechan->__field_mgr is {size = 256, used = 75, last_alloc = 0x25 <Address 0x25 out of bounds>, owner_file = 0x565f94 "channel.c", owner_func = 0x567720 "ast_channel_free", owner_line = 1502}

Which refers to:

       ast_string_field_free_memory(chan);

in ast_channel_free.


I'm a bit stuck as to how to debug this further. And I need to get some work done, so I'll attempt to redo the dialplan and check if the crashes go away. Say the word if you want more info out of the running/crashed binary.


Regards,
Walter Doekes
OSSO B.V.
Comments:By: Walter Doekes (wdoekes) 2010-09-28 06:35:12

Okay, it seems that the Pipe warning is not the cause but a side-effect. I've also figured out that the warnings (WARNING: Freeing unused memory at 0x122ec08, in __ast_module_user_remove of loader.c, line 222) is probably very serious. (Shouldn't this be an ERROR?)

I've reverted my dialplan to before my most recent edits and I can still get asterisk to crash (currently running an unpatched 1.6.2.14-rc1). But it's hard to reproduce. What I do is call the same number quite a number of times and hang up pretty quickly. Sometimes the phone is answered in time, sometimes it isn't.

After every call, the used-modules-list is always set to 0, except after one of these errors:


*CLI> module show
...
chan_local.so                  Local Proxy Channel (Note: used internal 0        
...

   -- Executing [+310@voipgrid_out:1] Dial("Local/+310@world_out-f7f7;2", "SIP/+310@126680007") in new stack
 == Using SIP RTP CoS mark 5
 == Using UDPTL CoS mark 5
   -- Called +310@126680007
   -- Called +310@world_out
   -- Local/ID2@osvpi_route_fixeddestination-a20c;1 is ringing
   -- Local/+311@osvpi_world-7df6;1 is ringing
   -- SIP/126680007-00000031 answered Local/+310@world_out-f7f7;2
   -- Local/+310@world_out-f7f7;1 answered Local/ID2@osvpi_route_fixeddestination-a20c;2
   -- Local/ID2@osvpi_route_fixeddestination-a20c;1 stopped sounds
   -- Local/ID2@osvpi_route_fixeddestination-a20c;1 answered Local/+311@osvpi_world-7df6;2
   -- Local/+311@osvpi_world-7df6;1 stopped sounds
   -- Local/+311@osvpi_world-7df6;1 answered SIP/126680002-00000030
WARNING: Freeing unused memory at 0x1a9f658, in __ast_module_user_remove of loader.c, line 222
[2010-09-28 13:21:58] WARNING[27190]: channel.c:1072 __ast_queue_frame: Unable to write to alert pipe on Local/+310@world_out-f7f7;1<ZOMBIE> (qlen = 0): Bad file descriptor!
 == Spawn extension (voipgrid_out, +310, 1) exited non-zero on 'Local/+310@world_out-f7f7;2'
 == Spawn extension (osvpi_world_real_call, +310, 3) exited non-zero on 'Local/ID2@osvpi_route_fixeddestination-a20c;2'
   -- Packet2Packet bridging SIP/126680002-00000030 and SIP/126680007-00000031
 == Spawn extension (osvpi_route_fixeddestination_step, +311, 1) exited non-zero on 'Local/+311@osvpi_world-7df6;2'
 == Spawn extension (osvpi_world_call, +311, 7) exited non-zero on 'SIP/126680002-00000030'

*CLI> module show
...
chan_local.so                  Local Proxy Channel (Note: used internal -1        
...


Now I can call the number some more and hang up quickly, but if the other side picks up, the crash happens.


Sometimes I don't get a second chance and it crashes immediately, like this:

   -- Executing [+310@voipgrid_out:1] Dial("Local/+310@world_out-0efe;2", "SIP/+310@126680007") in new stack
 == Using SIP RTP CoS mark 5
 == Using UDPTL CoS mark 5
   -- Called +310@126680007
   -- Called +310@world_out
   -- Local/ID2@osvpi_route_fixeddestination-8a76;1 is ringing
   -- Local/+311@osvpi_world-5a84;1 is ringing
   -- SIP/126680007-0000000f answered Local/+310@world_out-0efe;2
   -- Local/+310@world_out-0efe;1 answered Local/ID2@osvpi_route_fixeddestination-8a76;2
   -- Local/ID2@osvpi_route_fixeddestination-8a76;1 stopped sounds
   -- Local/+311@osvpi_world-5a84;1 stopped sounds
   -- Local/ID2@osvpi_route_fixeddestination-8a76;1 answered Local/+311@osvpi_world-5a84;2
   -- Local/+311@osvpi_world-5a84;1 answered SIP/126680002-0000000e
WARNING: Freeing unused memory at 0x153b808, in __ast_module_user_remove of loader.c, line 222
 == Spawn extension (osvpi_world_real_call, +310, 3) exited non-zero on 'Local/ID2@osvpi_route_fixeddestination-8a76;2'
WARNING: Freeing unused memory at 0x153ab48, in local_pvt_destroy of chan_local.c, line 240
Segmentation fault

By: Alec Davis (alecdavis) 2010-09-28 06:48:56

this may be a local channel masquerade problem.
try the '/n' option like:

Dial(Local/1234@context/n)

By: Walter Doekes (wdoekes) 2010-09-28 09:25:31

Wow. Indeed. I've replaced all occurrences of the Local Dial and I haven't succeeded in crashing it yet.

Thanks!

My cdr's still look the same and my ${__variables} work too. What's the catch?

By: David Woolley (davidw) 2010-09-28 09:58:54

The catch is that you have an extra layer of relaying the RTP packets and there is no possibility of an external bridge.

By: Leif Madsen (lmadsen) 2010-09-28 11:07:35

We should probably still handle this a bit more gracefully than crashing :)

By: Alec Davis (alecdavis) 2010-09-28 15:27:34

https://reviewboard.asterisk.org/r/387/
R222761 was the commit.

Did not make it into 1.6.2

By: Leif Madsen (lmadsen) 2010-09-30 14:30:33

Can you re-test with a revision that is new enough to include Alec Davis' fix?

By: Alec Davis (alecdavis) 2010-09-30 18:10:56

I believe this is fixed in 1.8 and trunk, see ASTERISK-1755801

You are going to need to try 1.8 or trunk though.

By: Walter Doekes (wdoekes) 2010-10-11 08:02:37

Bah. I tried really hard to reproduce the circumstances under which I got it to crash, but I did not succeed in crashing 1.6.2.x today.

So testing whether 1.8 crashed wouldn't have made any sense. I'll try some more in due time.

By: Leif Madsen (lmadsen) 2011-02-08 13:16:38.000-0600

I'm going to close this issue for now as the reporter wasn't able to reproduce. If you're able to reproduce this then please feel free to open a new issue with additional information (it's easier to triage a new issue that hasn't already been imported).