[Home]

Summary:ASTERISK-15271: [patch] New music on hold patches cause asterisk + full system hard lock
Reporter:Andrew Parisio (parisioa)Labels:
Date Opened:2009-12-03 15:24:19.000-0600Date Closed:2009-12-18 17:52:00.000-0600
Priority:CriticalRegression?Yes
Status:Closed/CompleteComponents:Resources/res_musiconhold
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 20091207__issue16279__2.diff.txt
( 1) 20091207__issue16279__3.diff.txt
( 2) 20091207__issue16388.diff.txt
( 3) 20091208__issue16388.diff.txt
( 4) blockingerror.jpg
( 5) photo.jpg
( 6) refs.tar.gz
( 7) refs.txt
( 8) refs2.txt
( 9) ss.jpg
(10) valgrind.supp
(11) valgrind.tar.gz
Description:After all the changes we made to music on hold, my asterisk system now hard locks after a while.  I had listened to a few streams, we hit what we thught may be a memory corruption issue in bug 16279 and we in fact did.  Check out the cameraphone picture of the console of the server attached.

I have no idea what this bug should be filed under so i'm going under music on hold where we were before.
Comments:By: Andrew Parisio (parisioa) 2009-12-03 15:49:47.000-0600

please note that the reproduceability always is set for the MOH crashing.
i've had one hardlock.  i'm currently recompiling and will post valgrind logs as requested in the prior bug thread shortly.

By: Andrew Parisio (parisioa) 2009-12-03 15:51:58.000-0600

hmmm, not sure what to do with this.

csgtacsip1:/usr/src/asterisk-1.6.1.svn# valgrind --suppressions=/usr/src/asterisk-1.6.1.svn/contrib/valgrind.supp --log-fd=9 asterisk -vvvvcg 9>valgrind.txt
csgtacsip1:/usr/src/asterisk-1.6.1.svn#
csgtacsip1:/usr/src/asterisk-1.6.1.svn#
csgtacsip1:/usr/src/asterisk-1.6.1.svn# cat valgrind.txt
==32676== Memcheck, a memory error detector.
==32676== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==32676== Using LibVEX rev 1854, a library for dynamic binary translation.
==32676== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==32676== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
==32676== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==32676== For more details, rerun with: -v
==32676==
location should start with fun: or obj:
==32676== FATAL: in suppressions file '/usr/src/asterisk-1.6.1.svn/contrib/valgrind.supp': location should start with 'fun:' or 'obj:'
==32676== exiting now.

By: Atis Lezdins (atis) 2009-12-03 19:53:59.000-0600

Try with attached suppressions file. It seems to work for both 3.3 and 3.4 versions of valgrind.

If unsuccessful, you may run without suppressions file, but there will be some irrelevant debug.

Btw, i don't know anything about your issue, just fixing my valgrind suppressions file :)



By: Andrew Parisio (parisioa) 2009-12-03 21:10:03.000-0600

new supressions file worked. thanks!

i did a few listens, some reloads, moh reloads in there.  if you want me to do anything else, let me know.

also i enabled dont_optimize and the malloc flags as the readme says.

By: Andrew Parisio (parisioa) 2009-12-04 19:03:33.000-0600

it happened again on  SVN-branch-1.6.1-r232865
edit: nothing was happening either. it's 5pm on a friday and it exploded with nobody using it.



By: Tilghman Lesher (tilghman) 2009-12-07 10:36:43.000-0600

Please delete your /tmp/refs file, patch, and restart Asterisk.  After you either get the warning (hopefully preventing your system from crashing) or a crash, please upload your /tmp/refs file.

By: Andrew Parisio (parisioa) 2009-12-07 11:23:06.000-0600

okay, done.  no crashes (it randomly hard locks the system after many hours), but the stream gets killed.

By: Tilghman Lesher (tilghman) 2009-12-07 11:58:20.000-0600

Could you try that again, without compressing the /tmp/refs file?  It appears your upload was corrupt.

By: Andrew Parisio (parisioa) 2009-12-07 12:11:48.000-0600

i recompiled a clean tree and it doesn't generate the refs file anymore.  i did that because i got compile errors last time, probably from a prior patch that had been applied.

By: Tilghman Lesher (tilghman) 2009-12-07 12:29:23.000-0600

Right, that means you don't have the patch applied.  I've regenerated the patch without warnings, so please try using that, and grab the resulting refs file.

By: Andrew Parisio (parisioa) 2009-12-07 12:36:58.000-0600

okay, patch applied and new log uploaded. it's small enough now it doesnt need to be compressed.

By: Tilghman Lesher (tilghman) 2009-12-07 15:43:46.000-0600

New patch uploaded.  It's still not clear to me where this leak is occurring.

By: Andrew Parisio (parisioa) 2009-12-07 15:45:17.000-0600

the blank one is the one i listened too.

csgtacsip1*CLI> moh show classes
Class: 100368-moh
       Mode: files
       Directory: /var/lib/asterisk/moh/custom/100368/
Class: hot108jamz
       Mode: custom
       Directory: nodir
       Application: /usr/bin/mpg123 -q -s --mono -r 8000 -f 8192 -b 0 http://scfire-mtc-aa04.stream.aol.com:80/stream/1071
       Format: slin
Class:
       Mode: <none>
       Directory: <none>
       Format: unknown
Class: default
       Mode: files
       Directory: /var/lib/asterisk/moh
Class: 977rock
       Mode: custom
       Directory: nodir
       Application: /usr/bin/mpg123 -q -s --mono -r 8000 -f 8192 -b 0 http://scfire-mtc-aa04.stream.aol.com:80/stream/1071
       Format: slin
Class: Hitz
       Mode: custom
       Directory: nodir
       Application: /usr/bin/mpg123 -q -s --mono -r 8000 -f 8192 -b 0 http://scfire-mtc-aa02.stream.aol.com:80/stream/1074
       Format: slin
[Dec  7 13:46:53] ERROR[27223]: astobj2.c:116 INTERNAL_OBJ: bad magic number 0x6567656c for 0x9086c58
[Dec  7 13:46:53] ERROR[27223]: astobj2.c:116 INTERNAL_OBJ: bad magic number 0x0 for 0x9086c58



By: Tilghman Lesher (tilghman) 2009-12-07 16:32:20.000-0600

New patch uploaded.  This one still contains all the debugging, in case I still haven't caught up to this, but I have fixed something in this patch.  To be clear, the refs file is not helpful unless you have the words "BAD ATTEMPT" in the file (which means a crash was averted, though a bug still remains).

By: Andrew Parisio (parisioa) 2009-12-07 16:36:27.000-0600

holy crap it works!

i can connect multime times in succession and it works. multiple people can connect at the same time no problem.  I'll hammer it a bit and see if i can break it now.

By: Andrew Parisio (parisioa) 2009-12-07 16:37:05.000-0600

each time i connect to the stream i get this on the console, all 4 lines each time:

[Dec  7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe
[Dec  7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe
[Dec  7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe
[Dec  7 14:38:27] ERROR[28190]: utils.c:1173 ast_careful_fwrite: fwrite() returned error: Broken pipe

By: Tilghman Lesher (tilghman) 2009-12-07 16:50:42.000-0600

I think those errors are mostly unrelated.  That routine is only called from manager.c and relates to being unable to send data to a remote socket, usually because the connection has closed.  While MOH certainly queues manager events, which could cause this error, if you had dead AMI sessions, the error indicates nothing wrong with MOH itself.

By: Andrew Parisio (parisioa) 2009-12-07 17:07:03.000-0600

oh okay, that's probably the way i drive MOH.  the phone goes to a webpage which telnets into the AMI and tells asterisk to call the phone with the select MOH stream.

what is the clean way to exit from the AMI?
edit: i just RTFM'd and found it Action: Logoff. thx



By: Digium Subversion (svnbot) 2009-12-07 17:10:13.000-0600

Repository: asterisk
Revision: 233577

U   trunk/contrib/valgrind.supp

------------------------------------------------------------------------
r233577 | atis | 2009-12-07 17:10:13 -0600 (Mon, 07 Dec 2009) | 8 lines

Fix compatibility with valgrind 3.3 and older.

(noticed in issue ASTERISK-15271)
Reported by: parisioa
Patches:
   valgrind.supp uloaded by atis (license 242)
Tested by: atis, parisioa

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233577

By: Digium Subversion (svnbot) 2009-12-07 17:54:11.000-0600

Repository: asterisk
Revision: 233615

_U  branches/1.6.2/
U   branches/1.6.2/contrib/valgrind.supp

------------------------------------------------------------------------
r233615 | atis | 2009-12-07 17:54:11 -0600 (Mon, 07 Dec 2009) | 15 lines

Merged revisions 233577 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines
 
 Fix compatibility with valgrind 3.3 and older.
 
 (noticed in issue ASTERISK-15271)
 Reported by: parisioa
 Patches:
     valgrind.supp uloaded by atis (license 242)
 Tested by: atis, parisioa
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233615

By: Digium Subversion (svnbot) 2009-12-07 17:56:05.000-0600

Repository: asterisk
Revision: 233616

_U  branches/1.6.1/
U   branches/1.6.1/contrib/valgrind.supp

------------------------------------------------------------------------
r233616 | atis | 2009-12-07 17:56:05 -0600 (Mon, 07 Dec 2009) | 15 lines

Merged revisions 233577 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines
 
 Fix compatibility with valgrind 3.3 and older.
 
 (noticed in issue ASTERISK-15271)
 Reported by: parisioa
 Patches:
     valgrind.supp uloaded by atis (license 242)
 Tested by: atis, parisioa
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233616

By: Digium Subversion (svnbot) 2009-12-07 17:57:22.000-0600

Repository: asterisk
Revision: 233617

_U  branches/1.6.0/
U   branches/1.6.0/contrib/valgrind.supp

------------------------------------------------------------------------
r233617 | atis | 2009-12-07 17:57:21 -0600 (Mon, 07 Dec 2009) | 15 lines

Merged revisions 233577 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines
 
 Fix compatibility with valgrind 3.3 and older.
 
 (noticed in issue ASTERISK-15271)
 Reported by: parisioa
 Patches:
     valgrind.supp uloaded by atis (license 242)
 Tested by: atis, parisioa
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233617

By: Digium Subversion (svnbot) 2009-12-07 18:02:44.000-0600

Repository: asterisk
Revision: 233618

U   branches/1.4/contrib/valgrind.supp

------------------------------------------------------------------------
r233618 | atis | 2009-12-07 18:02:43 -0600 (Mon, 07 Dec 2009) | 15 lines

Merged revisions 233577 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines
 
 Fix compatibility with valgrind 3.3 and older.
 
 (noticed in issue ASTERISK-15271)
 Reported by: parisioa
 Patches:
     valgrind.supp uloaded by atis (license 242)
 Tested by: atis, parisioa
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233618

By: Digium Subversion (svnbot) 2009-12-07 18:09:34.000-0600

Repository: asterisk
Revision: 233619

_U  trunk/

------------------------------------------------------------------------
r233619 | atis | 2009-12-07 18:09:34 -0600 (Mon, 07 Dec 2009) | 21 lines

Blocked revisions 233618 via svnmerge

................
 r233618 | atis | 2009-12-08 02:02:43 +0200 (Tue, 08 Dec 2009) | 15 lines
 
 Merged revisions 233577 via svnmerge from
 https://origsvn.digium.com/svn/asterisk/trunk
 
 ........
   r233577 | atis | 2009-12-08 01:10:13 +0200 (Tue, 08 Dec 2009) | 8 lines
   
   Fix compatibility with valgrind 3.3 and older.
   
   (noticed in issue ASTERISK-15271)
   Reported by: parisioa
   Patches:
       valgrind.supp uloaded by atis (license 242)
   Tested by: atis, parisioa
 ........
................

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233619

By: Digium Subversion (svnbot) 2009-12-08 12:22:45.000-0600

Repository: asterisk
Revision: 233718

U   trunk/res/res_musiconhold.c

------------------------------------------------------------------------
r233718 | tilghman | 2009-12-08 12:22:45 -0600 (Tue, 08 Dec 2009) | 8 lines

Find another ref leak and change how we manage module references.
(closes issue ASTERISK-15271)
Reported by: parisioa
Patches:
      20091208__issue16388.diff.txt uploaded by tilghman (license 14)
Tested by: parisioa, tilghman
Review: https://reviewboard.asterisk.org/r/442/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233718

By: Digium Subversion (svnbot) 2009-12-08 12:28:19.000-0600

Repository: asterisk
Revision: 233729

_U  branches/1.6.0/
U   branches/1.6.0/res/res_musiconhold.c

------------------------------------------------------------------------
r233729 | tilghman | 2009-12-08 12:28:18 -0600 (Tue, 08 Dec 2009) | 15 lines

Merged revisions 233718 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines
 
 Find another ref leak and change how we manage module references.
 (closes issue ASTERISK-15271)
  Reported by: parisioa
  Patches:
        20091208__issue16388.diff.txt uploaded by tilghman (license 14)
  Tested by: parisioa, tilghman
  Review: https://reviewboard.asterisk.org/r/442/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233729

By: Digium Subversion (svnbot) 2009-12-08 12:31:04.000-0600

Repository: asterisk
Revision: 233730

_U  branches/1.6.1/
U   branches/1.6.1/res/res_musiconhold.c

------------------------------------------------------------------------
r233730 | tilghman | 2009-12-08 12:31:03 -0600 (Tue, 08 Dec 2009) | 15 lines

Merged revisions 233718 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines
 
 Find another ref leak and change how we manage module references.
 (closes issue ASTERISK-15271)
  Reported by: parisioa
  Patches:
        20091208__issue16388.diff.txt uploaded by tilghman (license 14)
  Tested by: parisioa, tilghman
  Review: https://reviewboard.asterisk.org/r/442/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233730

By: Digium Subversion (svnbot) 2009-12-08 12:33:55.000-0600

Repository: asterisk
Revision: 233731

_U  branches/1.6.2/
U   branches/1.6.2/res/res_musiconhold.c

------------------------------------------------------------------------
r233731 | tilghman | 2009-12-08 12:33:54 -0600 (Tue, 08 Dec 2009) | 15 lines

Merged revisions 233718 via svnmerge from
https://origsvn.digium.com/svn/asterisk/trunk

........
 r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines
 
 Find another ref leak and change how we manage module references.
 (closes issue ASTERISK-15271)
  Reported by: parisioa
  Patches:
        20091208__issue16388.diff.txt uploaded by tilghman (license 14)
  Tested by: parisioa, tilghman
  Review: https://reviewboard.asterisk.org/r/442/
........

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233731

By: Digium Subversion (svnbot) 2009-12-08 12:35:15.000-0600

Repository: asterisk
Revision: 233718

U   trunk/res/res_musiconhold.c

------------------------------------------------------------------------
r233718 | tilghman | 2009-12-08 12:22:44 -0600 (Tue, 08 Dec 2009) | 8 lines

Find another ref leak and change how we manage module references.
(closes issue ASTERISK-15271, closes issue ASTERISK-15170, closes issue ASTERISK-15273)
Reported by: parisioa
Patches:
      20091208__issue16388.diff.txt uploaded by tilghman (license 14)
Tested by: parisioa, tilghman
Review: https://reviewboard.asterisk.org/r/442/

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=233718

By: Andrew Parisio (parisioa) 2009-12-09 18:50:54.000-0600

A few hours ago my system hard locked again and quit pinging / responding.  I attached the screenshot from the console again.  These hardlocks started happening with the MOH patches.  Note that the screenshot references an Asterisk Blocking state.


A few minutes ago i got about 30 of these lines at my console:
[Dec  9 16:45:47] WARNING[3419]: res_musiconhold.c:620 monmp3thread: Unable to send a SIGHUP to MOH process?!!: No such process
[Dec  9 16:45:50] WARNING[3419]: res_musiconhold.c:620 monmp3thread: Unable to send a SIGHUP to MOH process?!!: No such process

By: Tilghman Lesher (tilghman) 2009-12-10 12:41:05.000-0600

There is literally no reason why a user process should cause a CPU lockup.  Ever.  This is a kernel bug.  We may have inadvertently triggered it, but it's still a kernel bug.

By: Tilghman Lesher (tilghman) 2009-12-10 12:53:18.000-0600

I would really recommend a kernel upgrade at this point.

By: Andrew Parisio (parisioa) 2009-12-10 14:36:52.000-0600

Linux csgtacsip1 2.6.26-2-amd64 #1 SMP Wed Aug 19 07:01:17 UTC 2009 x86_64 GNU/Linux


Not that old.

By: Tilghman Lesher (tilghman) 2009-12-10 17:36:40.000-0600

Kernel 2.6.32 was just released.

By: Tilghman Lesher (tilghman) 2009-12-10 17:47:02.000-0600

Combing through similar reports shows that the kernel scheduler was improved in 2.6.27 to avoid this exact situation.

By: Andrew Parisio (parisioa) 2009-12-10 17:50:23.000-0600

ok, donwloading 2.6.32 now to compile / test. will report back next week.

By: Tilghman Lesher (tilghman) 2009-12-16 23:08:24.000-0600

Has the new kernel scheduler relieved your lockup?

By: Andrew Parisio (parisioa) 2009-12-16 23:09:47.000-0600

i haven't seen it happen since last week (it happened twice in a week), and i haven't gone to 2.6.32 yet either, so i have no idea.  i'm a little worried, i need to find time to install 2.6.32.

By: Tilghman Lesher (tilghman) 2009-12-17 11:32:04.000-0600

Honestly, just upgrading to 2.6.27 should be sufficient.  However, given that you are no longer experiencing this, it's difficult to see why we need to keep this open.