[Home]

Summary:ASTERISK-03953: core dump in libpri
Reporter:Abhishek Tiwari (abhi)Labels:
Date Opened:2005-04-19 13:18:33Date Closed:2011-06-07 14:00:23
Priority:CriticalRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) chan_zap.c.patch
Description:random crash in libpri under heavy load. Asterisk with TE405P with span 2 connected to provider. There were some 25 calls while this happened. Couldnt reproduce, randomly happened. Will see if happens again.

****** ADDITIONAL INFORMATION ******

#0  0xb6f7c525 in pri_disconnect_timeout (data=0x8218ac8) at q931.c:2743
2743            if (pri->debug & PRI_DEBUG_Q931_STATE)
(gdb) p *c
$1 = {pri = 0x2f70615a, cr = 825045560, forceinvert = 926168621, next = 0x0, slotmap = 0, channelno = 0, ds1no = 0, chanflags = 0, alive = 0, acked = 0,
 sendhangupack = 0, proc = 0, ri = 0, transcapability = 0, transmoderate = 0, transmultiple = 0, userl1 = 0, userl2 = 0, userl3 = 0, rateadaption = 0,
 sentchannel = -1224898944, justsignalling = 135980296, progcode = 28261, progloc = 0, progress = 0, progressmask = 0, notify = 0, causecode = -1224940171,
 causeloc = 92, cause = -1, peercallstate = -1, ourcallstate = -1, sugcallstate = -1, callerplan = -1, callerpres = 188,
 callernum = "####", '\0' <repeats 56 times>, "\020\200\f\002", '\0' <repeats 12 times>, "\001", '\0' <repeats 11 times>, "C_\016\b\003%#\v%#\000\000\000\000p#\e\b####", '\0' <repeats 16 times>, "#", '\0' <repeats 11 times>, "\031###\005\000\000\000\000\000\000\000H\000\000\000@\000\000\000@\000\000\000\000\000\000\000#\n\034\b", '\0' <repeats 28 times>, "from-zap", '\0' <repeats 47 times>, callername = '\0' <repeats 188 times>, "s", '\0' <repeats 66 times>,
 digitbuf = '\0' <repeats 12 times>, "\001", '\0' <repeats 50 times>, ani2 = 0, calledplan = 0, nonisdn = 0,
 callednum = '\0' <repeats 76 times>, "\003", '\0' <repeats 27 times>, "\002", '\0' <repeats 99 times>, "C\000\000\000\206\000\000\0001113915798.681", '\0' <repeats 22 times>, "###\b\000\000\000", complete = 0, newcall = 0, retranstimer = 0, t308_timedout = 0, redirectingplan = 0, redirectingpres = 0, redirectingreason = 0,
 redirectingnum = "\b\000\000\000\000\000\000\000########\000\000\000\000\000\000\000\000@\000\000\000@\000\000\000#!!\b\227\001eBx\r\002", '\0' <repeats 17 times>, "\001\000\000\000#\002\000\000\fhM###!\b", '\0' <repeats 179 times>, useruserprotocoldisc = 0, useruserinfo = '\0' <repeats 255 times>,
 callingsubaddr = '\0' <repeats 255 times>, apdus = 0x2c8}
(gdb) p c->pri
$2 = (struct pri *) 0x2f70615a
(gdb) p *(c->pri)
Cannot access memory at address 0x2f70615a
(gdb) bt full
#0  0xb6f7c525 in pri_disconnect_timeout (data=0x8218ac8) at q931.c:2743
       c = (struct q931_call *) 0x8218ac8
       pri = (struct pri *) 0x2f70615a
#1  0xb6f7707f in __pri_schedule_run (pri=0x81def38, tv=0xb6ecbbbc) at prisched.c:97
       x = 4
       callback = (void (*)(void *)) 0xb6f7c501 <pri_disconnect_timeout>
       data = (void *) 0x8218ac8
       e = (pri_event *) 0x0
#2  0xb6f770e4 in pri_schedule_run (pri=0x81def38) at prisched.c:109
       tv = {tv_sec = 1113915800, tv_usec = 203069}
#3  0xb6fc0db3 in pri_dchannel (vpri=0xb6fda7c8) at chan_zap.c:7680
       pri = (struct zt_pri *) 0xb6fda7c8
       e = (pri_event *) 0x0
       fds = {{fd = 136, events = 3, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 23456, revents = -18611}, {fd = -1225998612, events = 2394,
   revents = -18628}}
       res = 0
       chanpos = 18
       x = 0
       haveidles = 0
       activeidles = 0
       nextidle = -1
       c = (struct ast_channel *) 0x0
       tv = {tv_sec = 1, tv_usec = 77244}
       lowest = {tv_sec = 1, tv_usec = 77244}
       next = (struct timeval *) 0x81def90
       lastidle = {tv_sec = 1113914408, tv_usec = 751515}
       doidling = 0
       cc = 0x0
       idlen = '\0' <repeats 79 times>
       idle = (struct ast_channel *) 0x0
       p = 0
       t = 1113915799
       i = 1
       which = 0
       numdchans = 1
       cause = 0
       crv = (struct zt_pvt *) 0x0
       threadid = 0
       attr = {__detachstate = 1, __schedpolicy = 0, __schedparam = {__sched_priority = 0}, __inheritsched = 1, __scope = 0, __guardsize = 4096, __stackaddr_set = 0,
 __stackaddr = 0x0, __stacksize = 2093056}
       ani2str = "\000\000\000\000\000"
       plancallingnum = '\0' <repeats 255 times>
       calledtonstr = "\000\000\000\000\000\000\000\000\000"
#4  0xb7589e51 in pthread_start_thread () from /lib/i686/libpthread.so.0
No symbol table info available.
ASTERISK-1  0xb747cd8a in clone () from /lib/i686/libc.so.6
No symbol table info available.
Comments:By: Paul Cadach (pcadach) 2005-04-19 14:11:33

Probably timer isn't stopped at channel release, so you have access to freed memory. Needs to be verified. I'll have a time at the end of this week to do it.

By: Abhishek Tiwari (abhi) 2005-04-21 12:44:26

maybe this could be due to my own fixes in chan_zap.c. I had to throw calls on a particular channel and in available() the p->owner was not set while p->call was not NULL in which case request for channel would return "unable to request channel" and repeated attempts for the same failed. To overcome this I had added some lines in available() which hangup the call, make p->call NULL and reset the particular channel.

       /* If no owner definitely available */
       if (!p->owner) {
               /* Trust PRI */
#ifdef ZAPATA_PRI
               if (p->pri) {
            /************ added code starts ************/
                       if(p->call)
                       {
                               ast_log(LOG_WARNING, "************ Call not null for %d channel on span %d, Foring Restart ************\n",PRI_CHANNEL(channelmatch), p->pri->span);

                               if (p->pri && p->pri->pri) {
                                       if (!pri_grab(p, p->pri)) {
                                               pri_hangup(p->pri->pri, p->call, -1);
                                               pri_destroycall(p->pri->pri, p->call);
                                               pri_rel(p->pri);
                                       } else
                                               ast_log(LOG_WARNING, "Failed to grab PRI!\n");
                               } else
                                       ast_log(LOG_WARNING, "The PRI Call have not been destroyed\n");

                               p->call = NULL;

                               ast_mutex_lock(&p->lock);
                               pri_reset(p->pri->pri, PVT_TO_CHANNEL(p));
                               p->resetting = 1;
                               ast_mutex_unlock(&p->lock);
                               return 0;
                       }
            /************ added code ends************/

                       if (p->resetting || p->call)
                               return 0;
                       else
                               return 1;
               }
#endif

This was working for quite some time (two weeks) before the crash happened.
Possibly there was some sync related problem due to this (still dont know why this should happen). Now I've changed pri_hangup(p->pri->pri, p->call, -1) to pri_hangup(p->pri->pri, p->call, 26) so that it sends RELEASE instead of DISCONNECT. This works fine as of now, but is there is a better fix for this ?

By: Matthew Fredrickson (mattf) 2005-04-21 15:45:53

Can you reproduce this in unmodified CVS-head?

By: Clod Patry (junky) 2005-04-21 22:10:51

abhi: can you add a patch (diff -u) instead of pasting code in a bug note? it's faster to apply patch when we can download them.
Thanks.

By: Abhishek Tiwari (abhi) 2005-04-22 03:43:27

didnt put the patch since just wanted to show what I had done for my own particular case (if it is particular to me). why owner is null but call still there because the call did not get any response to the DISCONNECT it sent (to the pbx side). in what case would that happen ?
   If someone UP there can verify that this is an actual problem (no owner and call) and this is a fix than things can be patched. pri_hangup/pri_destroycall does stop the retranstimer so the crash should not have happened. But still pri_hangup with cause=-1 is causing some sych problems and using 26 is just a hack (this sends a RELEASE instead of DISCONNECT). This is not on CVS-HEAD but still is from 1.0.7. Didnt see much changes on CVS HEAD except somthing for alerting and trans_cap. here's the patch on CVS HEAD (04-22-05)

By: Matthew Fredrickson (mattf) 2005-04-22 09:42:31

Wait.  I think I'm confused.  Your crash occurred with the code that you added or it occurred before you added this code?

By: Abhishek Tiwari (abhi) 2005-04-26 18:19:07

the crash happened after I had added the code, the existing chan_zap.c does not have this problem. Initially I thought this could be a general problem since asterisk had been running with the changes for over a week, with this part of the code being executed quiet often. But the problem is that how do I get back the channel that has not been properly released. This code work fine, only very rarely gives a dump. I got a dump again after 3 days of running, this time in pri_release_timeout. I couldnt see any problem in the added code. The code has been picked up from other parts of chan_zap.c. Guess this bug report can be ignored if the problem is in the code added by me, else there seems to be some problem. Can someone verify this ?

By: Mark Spencer (markster) 2005-04-26 23:14:11

I believe the problem has to do with double freeing occuring in your code.