Summary:ASTERISK-07667: [patch] res_musiconhold causes asterisk to hang in Solaris when compiled with zaptel drivers
Reporter:Bob Atkins (bob)Labels:
Date Opened:2006-09-03 05:33:31Date Closed:2007-02-17 21:56:30.000-0600
Versions:Frequency of
Environment:Attachments:( 0) res_musiconhold-patch_090306
Description:When asterisk is compiled with the zaptel drivers - on startup it will hang everytime on Solaris 2.8 systems. truss/traces show that it is stuck in what appears to be a race type condition making it difficult to determine the cause because, just running truss or gdb masks the effect and breaks out of the hang. However, eventually Asterisk will hang again.


The problem has been traced to the way res_musiconhold functions when it is compiled with the zaptel drivers. With the zaptel drives, res_musiconhold reads from /dev/zap/pseudo to induce a specific delay. When compiled without the zaptel drivers, res_musiconhold instead calls usleep to induce a specific delay.

Unlike Linux, Solaris threads are kernel based and the read from the /dev/zap/pseudo device does not permit the res_musiconhold thread to yield to the other asterisk threads whereas a call to usleep automatically allows the thread to yield.

The fix is to call the Solaris specific thr_yield() function just prior to performing the read from /dev/zap/pseudo. In Solaris 2.9 and 2.10, the posix sched_yield() function can be used however - it is just a declaration for the thr_yield() function in the system's /usr/include/pthread.h.

We have been running this patch for several weeks in production with both v1.2.9.1 and v1.2.10 and everything has been working flawlessly.

I have attached a unified patch to this bug report for consideration.
Comments:By: Bob Atkins (bob) 2006-09-15 19:53:59

Just wondering if anyone can take a look at this bug report and the attached patch?

Also, I'm concerned that similar but possibly more subtle race conditions might exist that could affect other areas of operation. For instance we have noticed that using qualify for sip devices produces somewhat unreliable results - sometimes all devices that are widely disbursed in different network locations will be flagged as down for no apparent reason.

By: Bob Atkins (bob) 2006-09-16 11:31:02

I filed a disclaimer a few months ago when I submitted earlier bug reports. Digium should still have it on file.

By: jmls (jmls) 2006-11-01 06:34:33.000-0600

can we get a developer to check this patch ? Thanks

By: Jason Parker (jparker) 2006-11-16 12:04:47.000-0600

What are the effects of this on versions newer than Solaris 2.8?

By: Joshua C. Colp (jcolp) 2007-01-22 19:49:59.000-0600

Fixed in 1.2 as of revision 51512, 1.4 as of revision 51513, and trunk as of revision 51514. Thanks!

By: Bob Atkins (bob) 2007-02-17 20:58:57.000-0600

Re-opened in error. I didn't read the last note that said that the fix was also incorporated in the v1.4 train. Sorry. I deleted my extraneous note. This bug can be closed.