ASTERISK-06305: Asterisk crashes several times a day

[Home]

Summary: ASTERISK-06305: Asterisk crashes several times a day

Reporter: Manny Nunez (daxn) Labels:

Date Opened: 2006-02-13 13:47:20.000-0600 Date Closed: 2006-02-17 11:56:39.000-0600

Priority: Critical Regression? No

Status: Closed/Complete Components: Core/General

Versions: Frequency of
Occurrence

Related
Issues:

Environment: Attachments: ( 0) BT_Fullfeb14.txt
( 1) BTfeb14.txt
( 2) core10972.txt
( 3) Core4721feb14.txt
( 4) Messagesfeb14.txt
( 5) PriDebugfeb14.txt
( 6) subaddr-fix.diff
( 7) Thread_apply_all_btfeb14.txt
( 8) Zapataconf.txt
( 9) Zaptelconf.txt

Description: Asterisk will stop through out the day, average 3 times a day.This as been going on since the beginning of the year.
I started out with Asterisk 1.0.9, I am now running 1.2.4, i am running the latest Zaptel and libpri.
This weekend I changed out my Digium TE210P card to a Sangoma A102, still the problem continues.

The symptoms are thus,all will work fine all of the sudden I will see the message span 2 yellow alarm across the console and and my asterisk CLI disconnected when i try asterisk -rvvvvvc it will not connect, I must use the asterisk -c command to start.
I have two spans, span 1 connects to my Telco,Span 2 connects to an Iwatsu PBX.
I have replaced my crossover cable, added a CSU, replaced the PRI card in the Iwatsu still no change.
I have included a core file from today's crashes. Please help, my customer is ready to kick me and Asterisk out the door.
If I can supply any more information I will be more than willing.
Thanks Manny

****** ADDITIONAL INFORMATION ******

I am running asterisk 1.2.4 on a DELL 2850 with 1 gig of memory

Comments: By: Manny Nunez (daxn) 2006-02-14 14:00:36.000-0600

I reinstalled asterisk 1.2.4 with make clean, and make valgrind last night. I had read this would make it easier for you to pin point the problem. I am not a developer so I really don't no what valgrind is. I started asterisk with the command asterisk -vvvg -c. This afernoon Feb. 14, 2006 at about 3:10 pm eastern time I once again crashed, by crash I mean asterisk stop working all calls on both of my PRI spans dropped and no new calls got get thru.
I have only 12 or so SIP phones on the system most the work asterisk as to do is route zap calls between both spans.

I have attached messages log,pri debug,backtraces include bt,bt full and threads.I edited PRI debug and messages to slightly before crash.

Please Please help me!
thanks Manny
By: Paul Cadach (pcadach) 2006-02-16 11:50:38.000-0600

Following to your backtrace, you have received a call with empty subaddress (without digits), and q931_get_number() gets confused when called with len < 0. Try attached patch to fix it.
By: Manny Nunez (daxn) 2006-02-17 07:45:45.000-0600

Thank you PCadach, I will apply the patch this afternoon and see what happens. I have a question do you have any ides what causes this? Is it as simple as a user picking up a phone dialing 9 on the pbx side and not dialing any more digits or more along the lines of a transmission noise on the PRI?

Manny
By: Paul Cadach (pcadach) 2006-02-17 11:09:59.000-0600

The problem is caused by non-careful analysis of subaddress information element sent to you from PBX (or telco, I don't remember). Coredump is caused when subaddress data isn't contains 'digits' information, just control fields with no additional data. In this case q931_get_number() procedure which collects 'digits' information from subaddress data gets confused because length of data where 'digits' collection required is negative.

Also, don't treat this patch as correct fix. It is just workaround for your problem. We will needs to see correct call trace with invalid subaddressing information element to identify real reason of cores.
By: Paul Cadach (pcadach) 2006-02-17 11:18:25.000-0600

Ok, subaddr-fix.diff is more clean - there was bug in calling party subaddress decoding in dump and receive parts. Workaround for negative length at calls to q931_get_number() is included to minimize possible coredumps in the future.
By: Matthew Fredrickson (mattf) 2006-02-17 11:55:12.000-0600

Fixed in trunk and 1.2
By: Matthew Fredrickson (mattf) 2006-02-17 11:56:38.000-0600

Fixed. Thanks PCadach!