[Home]

Summary:ASTERISK-10004: chan_iax2 tries to remove a nonexistend scheduled ping
Reporter:Frank Waller (explidous)Labels:
Date Opened:2007-08-01 14:09:52Date Closed:2011-06-07 14:07:48
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_iax2
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) iax_thread_bt2
( 1) iax_bt_full
( 2) iax_bt_full2
( 3) iax_cli_output
( 4) iax_thread_bt
Description:With DO_CRASH enabled asterisk crashes while trying to remove a nonexistent schedule entry for a ping from iax.

I was running Vicidial (a predictive dialer) on this server with twenty agents and dialing at a ratio of three to one. This means that there are twenty channels waiting in twenty meetmes and the server is dialing 60 numbers via IAX to another XEN server on the same box. When a number connects they get placed into one of the meetmes.

This crash happened amongst some other crashes that I am still debugging. I have not been able to narrow down exactly what caused this one. Most likely a threading issue.

****** STEPS TO REPRODUCE ******

I believe that it should be able to be reproduced by simply dialing many calls simultaneously via IAX to the same server on a low latency connection.
Comments:By: Frank Waller (explidous) 2007-08-01 14:22:00

Happened to me again. It was the first go after reporting this.

By: Frank Waller (explidous) 2007-08-01 14:41:18

iax cli output, iax bt full2, and iax_thread_bt2 all corresponds to the second crash. This time with SVN trunk 77862.



By: Frank Waller (explidous) 2007-08-01 14:52:23

Happened yet again. This seems to happen not long after this error:
chan_iax2.c:8281 socket_process: Received mini frame before first full voice frame

By: Digium Subversion (svnbot) 2007-08-01 16:58:55

Repository: asterisk
Revision: 77887

------------------------------------------------------------------------
r77887 | russell | 2007-08-01 16:58:53 -0500 (Wed, 01 Aug 2007) | 23 lines

Fix some race conditions which have been causing weird problems in chan_iax2.
The most notable problem is that people have been seeing storms of VNAK frames
being sent due to really old frames mysteriously being in the retransmission
queue and never getting removed.

It was possible that a dynamic thread got created, but did not acquire its lock
before the thread that created it signals it to perform an action.  When this
happens, the thread will sleep until it hits a timeout, and then get destroyed.
So, the action never gets performed and in some cases, means a frame doesn't
get transmitted and never gets freed since the scheduler never gets a chance
to reschedule transmission.

Another less severe race condition is in the handling of a timeout for a dynamic
thread.  It was possible for it to be acquired to perform at action at the same
time that it hit a timeout.  When this occurs, whatever action it was acquired
for would never get performed.

(patch contributed by Mihai and SteveK)
(closes issue ASTERISK-9946)
(closes issue ASTERISK-9912)
(closes issue ASTERISK-9902)
(possibly related to issue ASTERISK-10004)

------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2007-08-01 17:06:59

Repository: asterisk
Revision: 77889

------------------------------------------------------------------------
r77889 | russell | 2007-08-01 17:06:58 -0500 (Wed, 01 Aug 2007) | 31 lines

Merged revisions 77887 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4

........
r77887 | russell | 2007-08-01 17:16:17 -0500 (Wed, 01 Aug 2007) | 23 lines

Fix some race conditions which have been causing weird problems in chan_iax2.
The most notable problem is that people have been seeing storms of VNAK frames
being sent due to really old frames mysteriously being in the retransmission
queue and never getting removed.

It was possible that a dynamic thread got created, but did not acquire its lock
before the thread that created it signals it to perform an action.  When this
happens, the thread will sleep until it hits a timeout, and then get destroyed.
So, the action never gets performed and in some cases, means a frame doesn't
get transmitted and never gets freed since the scheduler never gets a chance
to reschedule transmission.

Another less severe race condition is in the handling of a timeout for a dynamic
thread.  It was possible for it to be acquired to perform at action at the same
time that it hit a timeout.  When this occurs, whatever action it was acquired
for would never get performed.

(patch contributed by Mihai and SteveK)
(closes issue ASTERISK-9946)
(closes issue ASTERISK-9912)
(closes issue ASTERISK-9902)
(possibly related to issue ASTERISK-10004)

........

------------------------------------------------------------------------

By: Russell Bryant (russell) 2007-08-01 17:08:20

Please try again including the fixes in those last commits.  Thanks

By: Frank Waller (explidous) 2007-08-02 09:19:39

Version 77893 with our patches from issue 10347 applied now crashes after 45 minutes from a seg fault. Version 77862 with those same patches crashed after 2 minutes from a DO_CRASH so I would say this is a major improvement.

By: Russell Bryant (russell) 2007-08-23 16:11:00

Well, removing a nonexistant schedule entry really is harmless.  I'd recommend not compiling with DO_CRASH.  :)