[Home]

Summary:ASTERISK-06845: Asterisk crashes unless built with 'make dont-optimize' w/ gcc 4.x on FC4/5
Reporter:Rudolf E. Steiner (res)Labels:
Date Opened:2006-04-25 13:06:04Date Closed:2006-05-19 20:02:17
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) IAX2_debug.txt
Description:Asterisk crash sometimes. We use version 1.2.5.

gdb says:

----- begin -----
Core was generated by `/usr/sbin/asterisk -p -g'.
Program terminated with signal 11, Segmentation fault.
[...]
#0  0x00d7554f in socket_read (id=0x0, fd=0, events=0, cbdata=0x0) at chan_iax2.c:7627
----- end -----
Comments:By: Andrey S Pankov (casper) 2006-04-25 13:17:22

bt/bt full?

By: Serge Vecher (serge-v) 2006-04-25 14:11:15

also, please note that v.1.2.5 is not current - v. 1.2.7.1, which included some chan_iax2 fixes since 1.2.5 release. Please update to v. 1.2.7.1 and run a backtrace as asked by casper attaching the results to the bug. Thanks.



By: Rudolf E. Steiner (res) 2006-04-25 18:19:54

Thank you.

1.2.7.1 does not work correctly in our environment. The reason:

    http://bugs.digium.com/view.php?id=6963

I think, we should wait for a fix on issue 6963. After that, I can upgrade to the newes version of Asterisk and report again ... after a few tests.

By: Rudolf E. Steiner (res) 2006-04-27 13:02:01

'mattf' corrected the problem reportet at issue 6963.

I'm now on version 'SVN-branch-1.2-r22866'. I will report about the stability in this matter ("chan_iax2").

By: Rudolf E. Steiner (res) 2006-05-02 12:59:45

After resolving issue 6963, I'm using 'Asterisk SVN-branch-1.2-r22866'.

The problem is not yet solved in this version, asterisk is still crashing.

gdb:
----- begin -----
#0  0x0056154f in socket_read (id=0x0, fd=0, events=0, cbdata=0x0) at chan_iax2.c:7629
7629    }
----- end -----

gdb-bt:
----- begin -----
#0  0x0056154f in socket_read (id=0x0, fd=0, events=0, cbdata=0x0) at chan_iax2.c:7629
#1  0x00000000 in ?? ()
----- end -----

gdb-bt full:
----- begin -----
#0  0x0056154f in socket_read (id=0x0, fd=0, events=0, cbdata=0x0) at chan_iax2.c:7629
       callno = 0
       trunked_ts = Variable "trunked_ts" is not available.
----- end -----

By: Andrey S Pankov (casper) 2006-05-02 13:43:42

As line 7629 stands for '}' (end of socket_read()) it's hard to say where it
segfaults. It seems to be a corrupted stack to me.

Does the problem occur if you do not use -p?

Could you try latest Corydon changes (in 1.2 as of r24019) related to realtime
priority please....

By: Serge Vecher (serge-v) 2006-05-02 13:56:24

RES: please confirm that you bt are from a non-optimized build, i.e. built with make dont-optmimize

By: Rudolf E. Steiner (res) 2006-05-02 15:03:33

'casper' wrote:

> Does the problem occur if you do not use -p?

Yes.

> Could you try latest Corydon changes (in 1.2 as of r24019)
> related to realtime priority please....

I have made a upgrade to 'SVN-branch-1.2-r24097'.



By: Rudolf E. Steiner (res) 2006-05-02 15:08:40

'vechers' wrote:

> please confirm that you bt are from a non-optimized
> build, i.e. built with make dont-optmimize

Please excuse my incompetence!

'SVN-branch-1.2-r24097' now runs "don't-optimized".

I will wait for the next crash and I will report.   :-(

By: Rudolf E. Steiner (res) 2006-05-03 13:58:22

Wow. Strange.

There is no crash up to now.

So I think, the "dont-optimize"-flag was the reason.

Do you think, it is the right thing to run Asterisk with the "dont-optimize"-flag in my environment?

Or are there another tests I should make?

By: Serge Vecher (serge-v) 2006-05-03 16:41:10

if you rebuld it regularly, with make clean / make install, will it crash?

By: Rudolf E. Steiner (res) 2006-05-04 06:55:17

Yes, I have reproduced it. If it run without the flag, we run in the reported instability. Now it run with the flag again.



By: Serge Vecher (serge-v) 2006-05-04 08:25:50

ok, good, at you can have the system stabilized. If at all possible, please turn iax debug on, let it crash, and post the console log here -- this way we can try to get to the root cause of it...

By: Rudolf E. Steiner (res) 2006-05-04 13:37:48

OK. It's running. I will post the result of the IAX2-debug after the next crash.

By: Rudolf E. Steiner (res) 2006-05-05 08:02:54

The file 'IAX2 debug.txt' show you the last lines before the last crash.

By: Serge Vecher (serge-v) 2006-05-05 09:31:41

RES: Since development is moving quickly, please always report what SVN revision you are using. Revision 24422 saw a big commit to chan_iax2.c http://lists.digium.com/pipermail/svn-commits/2006-May/013161.html.

Are you using r24422 or above?

By: Rudolf E. Steiner (res) 2006-05-05 12:05:06

Like posted, I am on 'r24097'. Should I upgrade and test again?

By: Serge Vecher (serge-v) 2006-05-05 12:09:27

RES: yes, please.

By: Andrey S Pankov (casper) 2006-05-05 13:21:30

What version of GCC are you using? What platform? Any information related
please... (distro with exact version and what updates did you apply, etc...)



By: Rudolf E. Steiner (res) 2006-05-05 14:14:31

'casper' wrote:

> What version of GCC are you using?

gcc-4.1.0-3

> What platform?

Kernel: 2.6.16-1.2080_smp
OS: Fedora CORE 5 (any available updates)
glibc-2.4-4

By: Andrey S Pankov (casper) 2006-05-05 14:18:26

Maybe a gcc 4.1 optimization bug.

By: Rudolf E. Steiner (res) 2006-05-05 14:32:43

vechers:

OK. Now I am using 'r24911' built _without_ the "dont-optimize"-flag.

I will report.

By: Rudolf E. Steiner (res) 2006-05-05 16:22:10

'casper' wrote:

>  Maybe a gcc 4.1 optimization bug.

I have the _same_ problem on a Fedora CORE 4-machine with 'gcc-4.0.1-4.fc4'.

By: BJ Weschke (bweschke) 2006-05-05 16:25:17

RES: Yep. I've been able to reproduce your "-x doesn't disconnect session" issues with FC4 and gcc 4.0.0.

By: Russell Bryant (russell) 2006-05-05 18:58:15

I'd like to take a look at this but I'll need access to a machine that is seeing this problem.  If this would be possible, find me on IRC.

By: Russell Bryant (russell) 2006-05-05 19:18:42

oops, I actually meant to post that comment to bug 7071 ...

By: Andrey S Pankov (casper) 2006-05-06 08:26:03

RES: gcc 4.0 is not a compiler at all, it's a piece of <you name it>... :)

By: Rudolf E. Steiner (res) 2006-05-08 03:35:32

It's crashing again. I have built with the "dont optimize"-flag again.

By: Serge Vecher (serge-v) 2006-05-08 08:50:25

1. Downgrading severity, since we have a workaround.
2. RES: could you please try to setup a test server using a different distibution to see if we can narrow this down to disto-specific problem?

By: Rudolf E. Steiner (res) 2006-05-08 12:08:01

vechers:

Please excuse my bad english!

We are running a lot of asterisk-servers on Fedora, they all use IAX2. The reported problem only exists on two servers. This two servers configured very straight ahead. They only konvert VoIP (alaw) to ISDN (E1) (Digium).

I really want to spend my time to analyse this issue, but I don't have enough ressources (ISDN/E1) to test this issue with a other distribution. The next problem is that the server crash only if a big number of calls transported over the system. The way to reproduce this issue is to kill a lot of calls in "real world".

It seems in such a way that I am the only one with the problem. The assumption is that it is actually connected with the distribution.

At next time we setup a new VoIP-/ISDN-gateway, we will take another distribution. After that, I will report again.

Do you think, it is the right way to close this issue right now?

By: Serge Vecher (serge-v) 2006-05-08 12:19:08

RES: well there is one last thing that remains unclear about this problem after your last report:
1) you've mentioned that only two servers out of many exibit the problem;
2) all servers have the same distro/compiler.
3) the problem only happens at high call volume.

Are two servers with the problem the only ones that handle high call volume? How do they compare to server without a problem hardware-wise?

By: Rudolf E. Steiner (res) 2006-05-08 13:22:15

'vechers' wrote:

> 1) you've mentioned that only two servers out of many exibit the problem;

Yes.

> 2) all servers have the same distro/compiler.

Yes.

> 3) the problem only happens at high call volume.

Correct.

> Are two servers with the problem the only ones that handle high call volume?

No.

> How do they compare to server without a problem hardware-wise?

Only in one thing:

A Digium-card.

By: Serge Vecher (serge-v) 2006-05-08 13:28:36

well, if 1,2,3 are all 'true' then the problem cannot be blamed on a distibution.

Do servers have different digium cards or how specifically the hardware is different?

By: Rudolf E. Steiner (res) 2006-05-08 14:30:53

The VoIP-servers without a ISDN-card handles SIP/ENUM- and IAX2-connections to other hosts and VoIP-telephones. The crashing servers only handles the calls to and from the PSTN (ISDN, E1). Any PSTN-call is transported over _both_ systems. VoIP <---> VoIP-calls are transported only with the "not crashing servers".

The difference: The "PSTN-servers" (the crashing servers) have different Didigum-cards. The "not crashing servers" don't have ISDN-cards, but they have way more calls.

The only thing the "crashing PSTN-servers" do is to convert ISDN to IAX2 (alaw) and IAX2 (alaw) to ISDN.

I don't understand, why it's crashing in 'chan_iax2'. Maybe a timing problem or so. But ... why it's stable with the "dont optimize"-flag?

By: Serge Vecher (serge-v) 2006-05-08 14:47:28

At this point, I think it is best that you get on #asterisk-dev channel (IRC) and ask for one of the developers to log into your system.                                    

By the way, what't the model # for the Digium card and what version of chan_misdn do you use?

By: Rudolf E. Steiner (res) 2006-05-08 15:06:01

'vechers' wrote:

> By the way, what't the model # for the Digium card and what
> version of chan_misdn do you use?

I use 'TE411P', 'TE210P' and 'TE110P'.

The diagnosis (IAX2-debug-output and Core-file-analyse) reported here are from the server with the 'TE411P'-card.

I use the 'chan_misdn.c' includet in 'Asterisk SVN-branch-1.2-r24911M' at the moment.

By: Serge Vecher (serge-v) 2006-05-19 13:57:33

RES: the version of Asterisk you are using suggests that modifications have been done to the source code -- can you please elaborate?

By: Kevin P. Fleming (kpfleming) 2006-05-19 20:02:16

Fixed in branch-1.2 revision 28896 and trunk revision 28903.